Optimus: an efficient dynamic resource scheduler for deep learning clusters Y Peng, Y Bao, Y Chen, C Wu, C Guo Proceedings of the Thirteenth EuroSys Conference, 1-14, 2018 | 472 | 2018 |
A generic communication scheduler for distributed DNN training acceleration Y Peng, Y Zhu, Y Chen, Y Bao, B Yi, C Lan, C Wu, C Guo Proceedings of the 27th ACM Symposium on Operating Systems Principles, 16-29, 2019 | 337 | 2019 |
Deep learning-based job placement in distributed machine learning clusters Y Bao, Y Peng, C Wu IEEE INFOCOM 2019-IEEE conference on computer communications, 505-513, 2019 | 145 | 2019 |
Online job scheduling in distributed machine learning clusters Y Bao, Y Peng, C Wu, Z Li IEEE INFOCOM 2018-IEEE Conference on Computer Communications, 495-503, 2018 | 126 | 2018 |
DL2: A deep learning-driven scheduler for deep learning clusters Y Peng, Y Bao, Y Chen, C Wu, C Meng, W Lin IEEE Transactions on Parallel and Distributed Systems 32 (8), 1947-1960, 2021 | 81 | 2021 |
Preemptive all-reduce scheduling for expediting distributed DNN training Y Bao, Y Peng, Y Chen, C Wu IEEE INFOCOM 2020-IEEE Conference on Computer Communications, 626-635, 2020 | 63 | 2020 |
{BGL}:{GPU-Efficient}{GNN} training by optimizing graph data {I/O} and preprocessing T Liu, Y Chen, D Li, C Wu, Y Zhu, J He, Y Peng, H Chen, H Chen, C Guo 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2023 | 56 | 2023 |
deTector: a Topology-aware Monitoring System for Data Center Networks Y Peng, J Yang, C Wu, C Guo, C Hu, Z Li 2017 USENIX Annual Technical Conference (USENIX ATC 17), 55-68, 2017 | 40 | 2017 |
Multi-resource interleaving for deep learning training Y Zhao, Y Liu, Y Peng, Y Zhu, X Liu, X Jin Proceedings of the ACM SIGCOMM 2022 Conference, 428-440, 2022 | 39 | 2022 |
Elastic parameter server load distribution in deep learning clusters Y Chen, Y Peng, Y Bao, C Wu, Y Zhu, C Guo Proceedings of the 11th ACM Symposium on Cloud Computing, 507-521, 2020 | 38 | 2020 |
Dynamic scaling of virtualized, distributed service chains: A case study of IMS J Duan, C Wu, F Le, AX Liu, Y Peng IEEE Journal on Selected Areas in Communications 35 (11), 2501-2511, 2017 | 37 | 2017 |
{MegaScale}: Scaling large language model training to more than 10,000 {GPUs} Z Jiang, H Lin, Y Zhong, Q Huang, Y Chen, Z Zhang, Y Peng, X Li, C Xie, ... 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2024 | 26 | 2024 |
Deep learning-based job placement in distributed machine learning clusters with heterogeneous workloads Y Bao, Y Peng, C Wu IEEE/ACM Transactions on Networking 31 (2), 634-647, 2022 | 13 | 2022 |
SP-GNN: Learning structure and position information from graphs Y Chen, J You, J He, Y Lin, Y Peng, C Wu, Y Zhu Neural Networks 161, 505-514, 2023 | 9 | 2023 |
dpro: A generic performance diagnosis and optimization toolkit for expediting distributed dnn training H Hu, C Jiang, Y Zhong, Y Peng, C Wu, Y Zhu, H Lin, C Guo Proceedings of Machine Learning and Systems 4, 623-637, 2022 | 9 | 2022 |
Sapipe: Staleness-aware pipeline for data parallel dnn training Y Chen, C Xie, M Ma, J Gu, Y Peng, H Lin, C Wu, Y Zhu Advances in Neural Information Processing Systems 35, 17981-17993, 2022 | 7 | 2022 |
LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization J Zhao, B Wan, Y Peng, H Lin, C Wu arXiv preprint arXiv:2403.01136, 2024 | 4 | 2024 |
CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs H Hu, J Su, J Zhao, Y Peng, Y Zhu, H Lin, C Wu Proceedings of the Nineteenth European Conference on Computer Systems, 1054-1074, 2024 | 1 | 2024 |
dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training H Hu, C Jiang, Y Zhong, Y Peng, C Wu, Y Zhu, H Lin, C Guo arXiv preprint arXiv:2205.02473, 2022 | 1 | 2022 |
QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices J Zhao, B Wan, Y Peng, H Lin, Y Zhu, C Wu 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2024 | | 2024 |