AMOS: enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction S Zheng, R Chen, A Wei, Y Jin, Q Han, L Lu, B Wu, X Li, S Yan, Y Liang Proceedings of the 49th Annual International Symposium on Computer …, 2022 | 41 | 2022 |
Fast distributed inference serving for large language models B Wu, Y Zhong, Z Zhang, G Huang, X Liu, X Jin arXiv preprint arXiv:2305.05920, 2023 | 28 | 2023 |
A survey of resource-efficient llm and multimodal foundation models M Xu, W Yin, D Cai, R Yi, D Xu, Q Wang, B Wu, Y Zhao, C Yang, S Wang, ... arXiv preprint arXiv:2401.08092, 2024 | 22 | 2024 |
Transparent {GPU} sharing in container clouds for deep learning workloads B Wu, Z Zhang, Z Bai, X Liu, X Jin 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2023 | 15 | 2023 |
Neoflow: A flexible framework for enabling efficient compilation for high performance dnn training S Zheng, R Chen, Y Jin, A Wei, B Wu, X Li, S Yan, Y Liang IEEE Transactions on Parallel and Distributed Systems 33 (11), 3220-3232, 2021 | 11 | 2021 |
LoongServe: Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism B Wu, S Liu, Y Zhong, P Sun, X Liu, X Jin arXiv preprint arXiv:2404.09526, 2024 | 3 | 2024 |
Xron: A hybrid elastic cloud overlay network for video conferencing at planetary scale B Wu, K Qian, B Li, Y Ma, Q Zhang, Z Jiang, J Zhao, D Cai, E Zhai, X Liu, ... Proceedings of the ACM SIGCOMM 2023 Conference, 696-709, 2023 | 3 | 2023 |
dLoRA: Dynamically Orchestrating Requests and Adapters for LoRA LLM Serving B Wu, R Zhu, Z Zhang, P Sun, X Liu, X Jin | | |