{PipeSwitch}: Fast pipelined context switching for deep learning applications Z Bai, Z Zhang, Y Zhu, X Jin 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2020 | 93 | 2020 |
Is network the bottleneck of distributed training? Z Zhang, C Chang, H Lin, Y Wang, R Arora, X Jin Proceedings of the Workshop on Network Meets AI & ML, 8-13, 2020 | 71 | 2020 |
MiCS: near-linear scaling for training gigantic model on public cloud Z Zhang, S Zheng, Y Wang, J Chiu, G Karypis, T Chilimbi, M Li, X Jin arXiv preprint arXiv:2205.00119, 2022 | 25 | 2022 |
Gemini: Fast failure recovery in distributed training with in-memory checkpoints Z Wang, Z Jia, S Zheng, Z Zhang, X Fu, TSE Ng, Y Wang Proceedings of the 29th Symposium on Operating Systems Principles, 364-381, 2023 | 19 | 2023 |
Oobleck: Resilient distributed training of large models using pipeline templates I Jang, Z Yang, Z Zhang, X Jin, M Chowdhury Proceedings of the 29th Symposium on Operating Systems Principles, 382-395, 2023 | 16 | 2023 |
TKPERM: cross-platform permission knowledge transfer to detect overprivileged third-party applications FH Shezan, K Cheng, Z Zhang, Y Cao, Y Tian Network and Distributed Systems Security (NDSS) Symposium, 2020 | 12 | 2020 |
Towards a secure zero-rating framework with three parties Z Liu, Z Zhang, Y Cao, Z Xi, S Jing, H La Roche 27th USENIX Security Symposium (USENIX Security 18), 711-728, 2018 | 3 | 2018 |
Decoupled Model Schedule for Deep Learning Training. H Chen, CH Yu, S Zheng, Z Zhang, Z Zhang, Y Wang arXiv preprint arXiv:2302.08005, 2023 | 1 | 2023 |
MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud Z Zhang, S Zheng, Y Wang, J Chiu, G Karypis, T Chilimbi, M Li, X Jin Proceedings of the VLDB Endowment, 0 | | |