Sirius: An open end-to-end voice and vision personal assistant and its implications for future warehouse scale computers J Hauswald, MA Laurenzano, Y Zhang, C Li, A Rovinski, A Khurana, ... Proceedings of the Twentieth International Conference on Architectural …, 2015 | 337 | 2015 |
Stochastic circuits for real-time image-processing applications A Alaghi, C Li, JP Hayes Proceedings of the 50th Annual Design Automation Conference, 1-6, 2013 | 314 | 2013 |
Djinn and tonic: Dnn as a service and its implications for future warehouse scale computers J Hauswald, Y Kang, MA Laurenzano, Q Chen, C Li, T Mudge, ... ACM SIGARCH Computer Architecture News 43 (3S), 27-40, 2015 | 199 | 2015 |
Deepspeed-inference: enabling efficient inference of transformer models at unprecedented scale RY Aminabadi, S Rajbhandari, AA Awan, C Li, D Li, E Zheng, O Ruwase, ... SC22: International Conference for High Performance Computing, Networking …, 2022 | 168 | 2022 |
Accelerating reduction and scan using tensor core units A Dakkak, C Li, J Xiong, I Gelado, W Hwu Proceedings of the ACM International Conference on Supercomputing, 46-57, 2019 | 91 | 2019 |
KLAP: Kernel launch aggregation and promotion for optimizing dynamic parallelism I El Hajj, J Gómez-Luna, C Li, LW Chang, D Milojicic, W Hwu 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture …, 2016 | 43 | 2016 |
Evaluating characteristics of CUDA communication primitives on high-bandwidth interconnects C Pearson, A Dakkak, S Hashash, C Li, IH Chung, J Xiong, WM Hwu Proceedings of the 2019 ACM/SPEC International Conference on Performance …, 2019 | 37 | 2019 |
XSP: Across-stack profiling and analysis of machine learning models on GPUs C Li, A Dakkak, J Xiong, W Wei, L Xu, W Hwu 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2020 | 32* | 2020 |
Designing future warehouse-scale computers for sirius, an end-to-end voice and vision personal assistant J Hauswald, MA Laurenzano, Y Zhang, H Yang, Y Kang, C Li, A Rovinski, ... ACM Transactions on Computer Systems (TOCS) 34 (1), 1-32, 2016 | 32 | 2016 |
A comprehensive study on post-training quantization for large language models Z Yao, C Li, X Wu, S Youn, Y He arXiv preprint arXiv:2303.08302, 2023 | 30 | 2023 |
Trims: Transparent and isolated model sharing for low latency deep learning inference in function-as-a-service A Dakkak, C Li, SG De Gonzalo, J Xiong, W Hwu 2019 IEEE 12th International Conference on Cloud Computing (CLOUD), 372-382, 2019 | 30 | 2019 |
Zeroquant-v2: Exploring post-training quantization in llms from comprehensive study to low rank compensation Z Yao, X Wu, C Li, S Youn, Y He arXiv preprint arXiv:2303.08302, 2023 | 23 | 2023 |
Ai matrix: A deep learning benchmark for alibaba data centers W Zhang, W Wei, L Xu, L Jin, C Li arXiv preprint arXiv:1909.10562, 2019 | 21 | 2019 |
Understanding int4 quantization for transformer models: Latency speedup, composability, and failure cases X Wu, C Li, RY Aminabadi, Z Yao, Y He arXiv preprint arXiv:2301.12017, 2023 | 19 | 2023 |
Frustrated with replicating claims of a shared model? a solution A Dakkak, C Li, J Xiong, WM Hwu arXiv preprint arXiv:1811.09737, 2018 | 16* | 2018 |
Matrix factorization on gpus with memory optimization and approximate computing W Tan, S Chang, L Fong, C Li, Z Wang, L Cao Proceedings of the 47th International Conference on Parallel Processing, 1-10, 2018 | 16 | 2018 |
Acm Y Wang, W Feng, Y Chen, H Yu, M Huang, PS Yu Visual Domain Adaptation with Manifold Embedded Distribution Alignment, 402-410, 2018 | 15 | 2018 |
Understanding int4 quantization for language models: latency speedup, composability, and failure cases X Wu, C Li, RY Aminabadi, Z Yao, Y He International Conference on Machine Learning, 37524-37539, 2023 | 11 | 2023 |
Mpress: Democratizing billion-scale model training on multi-gpu servers via memory-saving inter-operator parallelism Q Zhou, H Wang, X Yu, C Li, Y Bai, F Yan, Y Xu 2023 IEEE International Symposium on High-Performance Computer Architecture …, 2023 | 11 | 2023 |
Random-ltd: Random and layerwise token dropping brings efficient training for large-scale transformers Z Yao, X Wu, C Li, C Holmes, M Zhang, C Li, Y He arXiv preprint arXiv:2211.11586, 2022 | 11 | 2022 |