Galvatron: Efficient transformer training over multiple gpus using automatic parallelism X Miao, Y Wang, Y Jiang, C Shi, X Nie, H Zhang, B Cui arXiv preprint arXiv:2211.13878, 2022 | 53 | 2022 |
Degnn: Improving graph neural networks with graph decomposition X Miao, NM Gürel, W Zhang, Z Han, B Li, W Min, SX Rao, H Ren, Y Shan, ... Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data …, 2021 | 28 | 2021 |
Improving Automatic Parallel Training via Balanced Memory Workload Optimization Y Wang, Y Jiang, X Miao, F Fu, S Zhu, X Nie, Y Tu, B Cui IEEE Transactions on Knowledge and Data Engineering, 2024 | 8 | 2024 |
Degnn: characterizing and improving graph neural networks with graph decomposition X Miao, NM Gürel, W Zhang, Z Han, B Li, W Min, X Rao, H Ren, Y Shan, ... arXiv preprint arXiv:1910.04499, 2019 | 4 | 2019 |
Enabling Parallelism Hot Switching for Efficient Training of Large Language Models H Ge, F Fu, H Li, X Wang, S Lin, Y Wang, X Nie, H Zhang, X Miao, B Cui Proceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles …, 2024 | 1 | 2024 |
Efficient Multi-Task Large Model Training via Data Heterogeneity-aware Model Management Y Wang, S Zhu, F Fu, X Miao, J Zhang, J Zhu, F Hong, Y Li, B Cui arXiv preprint arXiv:2409.03365, 2024 | 1 | 2024 |
Data-Centric and Heterogeneity-Adaptive Sequence Parallelism for Efficient LLM Training Y Wang, S Wang, S Zhu, F Fu, X Liu, X Xiao, H Li, J Li, F Wu, B Cui arXiv preprint arXiv:2412.01523, 2024 | | 2024 |
Malleus: Straggler-Resilient Hybrid Parallel Training of Large-scale Models via Malleable Data and Model Parallelization H Li, F Fu, H Ge, S Lin, X Wang, J Niu, Y Wang, H Zhang, X Nie, B Cui arXiv preprint arXiv:2410.13333, 2024 | | 2024 |
Graph Neural Network Training Acceleration for Multi-GPUs 苗旭鹏, 王驭捷, 沈佳, 邵蓥侠, 崔斌 Journal of Software 34 (9), 4407-4420, 2023 | | 2023 |