Accelerating Distributed Reinforcement Learning with In-Switch Computing Y Li, IJ Liu, Y Yuan, D Chen, A Schwing, J Huang The 46th International Symposium on Computer Architecture (ISCA'19), 2019 | 145 | 2019 |
Pipe-SGD: A Decentralized Pipelined SGD Framework for Distributed Deep Net Training Y Li, M Yu, S Li, S Avestimehr, NS Kim, A Schwing Advances in Neural Information Processing Systems (NeurIPS'18), 8045-8056, 2018 | 120 | 2018 |
Energy efficient parallel neuromorphic architectures with approximate arithmetic on FPGA Q Wang, Y Li, B Shao, S Dey, P Li Neurocomputing 221, 146-158, 2017 | 106 | 2017 |
A Network-Centric Hardware/Algorithm Co-Design to Accelerate Distributed Training of Deep Neural Networks Y Li, J Park, M Alian, Y Yuan, Z Qu, P Pan, R Wang, A Schwing, ... The 51st International Symposium on Microarchitecture (MICRO'18), 175-188, 2018 | 102 | 2018 |
GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN Training M Yu, Z Lin, K Narra, S Li, Y Li, NS Kim, A Schwing, M Annavaram, ... Advances in Neural Information Processing Systems (NeurIPS'18), 5123-5133, 2018 | 82 | 2018 |
DeepStore: In-Storage Acceleration for Intelligent Queries VS Mailthody, Z Qureshi, W Liang, Z Feng, SG Gonzalo, Y Li, H Franke, ... The 52nd International Symposium on Microarchitecture (MICRO'19), 2019 | 77 | 2019 |
BNS-GCN: Efficient full-graph training of graph convolutional networks with partition-parallelism and random boundary node sampling C Wan, Y Li, A Li, NS Kim, Y Lin Fifth Conference on Machine Learning and Systems (MLSys'22), 2022 | 67 | 2022 |
PipeGCN: Efficient full-graph training of graph convolutional networks with pipelined feature communication C Wan, Y Li, CR Wolfe, A Kyrillidis, NS Kim, Y Lin arXiv preprint arXiv:2203.10428, 2022 | 65 | 2022 |
Liquid state machine based pattern recognition on FPGA with firing-activity dependent power gating and approximate computing Q Wang, Y Li, P Li The IEEE International Symposium on Circuits and Systems (ISCAS'16), 361-364, 2016 | 60 | 2016 |
Harmony: Overcoming the hurdles of gpu memory capacity to train massive dnn models on commodity servers Y Li, A Phanishayee, D Murray, J Tarnawski, NS Kim arXiv preprint arXiv:2202.01306, 2022 | 20 | 2022 |
Visage: enabling timely analytics for drone imagery S Jha, Y Li, S Noghabi, V Ranganathan, P Kumar, A Nelson, M Toelle, ... The 27th Annual International Conference on Mobile Computing and Networking …, 2021 | 17 | 2021 |
Accelerating distributed reinforcement learning with in-switch computing. In 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA) Y Li, IJ Liu, Y Yuan, D Chen, A Schwing, J Huang IEEE, 279ś291, 2019 | 17 | 2019 |
Doing More with Less: Training Large DNN Models on Commodity Servers for the Masses Y Li, A Phanishayee, D Murray, NS Kim Hot Topics in Operating Systems (HotOS’21), 2021 | 6 | 2021 |
BDS-GCN: Efficient full-graph training of graph convolutional nets with partition-parallelism and boundary sampling C Wan, Y Li, NS Kim, Y Lin | 2 | 2020 |
Energy Efficient Spiking Neuromorphic Architectures for Pattern Recognition Y Li Master Thesis, ECE, Texas A&M University, 2016 | 1 | 2016 |
Communication-Centric Cross-Stack Acceleration for Distributed Machine Learning Y Li Ph.D. Dissertation, ECE, UIUC, 2022 | | 2022 |
BNS-GCN: Efficient Full-Graph Training of Graph Convolutional Networks with Boundary Node Sampling C Wan, Y Li, A Li, NS Kim, Y Lin arXiv preprint arXiv:2203.10983, 2022 | | 2022 |