Hawq-v2: Hessian aware trace-weighted quantization of neural networks Z Dong, Z Yao, D Arfeen, A Gholami, MW Mahoney, K Keutzer Advances in Neural Information Processing Systems 33, 2020 | 300 | 2020 |
Specinfer: Accelerating large language model serving with tree-based speculative inference and verification X Miao, G Oliaro, Z Zhang, X Cheng, Z Wang, Z Zhang, RYY Wong, A Zhu, ... Proceedings of the 29th ACM International Conference on Architectural …, 2024 | 171 | 2024 |
Large batch size training of neural networks with adversarial training and second-order information Z Yao, A Gholami, D Arfeen, R Liaw, J Gonzalez, K Keutzer, M Mahoney arXiv preprint arXiv:1810.01021, 2018 | 55 | 2018 |
Sia: Heterogeneity-aware, goodput-optimized ML-cluster scheduling S Jayaram Subramanya, D Arfeen, S Lin, A Qiao, Z Jia, GR Ganger Proceedings of the 29th Symposium on Operating Systems Principles, 642-657, 2023 | 40 | 2023 |
GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism B Jeon, M Wu, S Cao, S Kim, S Park, N Aggarwal, C Unger, D Arfeen, ... arXiv preprint arXiv:2406.17145, 2024 | 3 | 2024 |
PipeFill: Using GPUs During Bubbles in Pipeline-parallel LLM Training D Arfeen, Z Zhang, X Fu, GR Ganger, Y Wang arXiv preprint arXiv:2410.07192, 2024 | | 2024 |