Spotserve: Serving generative large language models on preemptible instances

X Miao, C Shi, J Duan, X Xi, D Lin, B Cui… - Proceedings of the 29th …, 2024 - dl.acm.org
The high computational and memory requirements of generative large language models
(LLMs) make it challenging to serve them cheaply. This paper aims to reduce the monetary …

Metis: Fast Automatic Distributed Training on Heterogeneous {GPUs}

T Um, B Oh, M Kang, WY Lee, G Kim, D Kim… - 2024 USENIX Annual …, 2024 - usenix.org
As deep learning model sizes expand and new GPUs are released every year, the need for
distributed training on heterogeneous GPUs rises to fully harness under-utilized low-end …

Efficient training of large language models on distributed infrastructures: A survey

J Duan, S Zhang, Z Wang, L Jiang, W Qu, Q Hu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) like GPT and LLaMA are revolutionizing the AI industry with
their sophisticated capabilities. Training these models requires vast GPU clusters and …

Sylvie: 3d-adaptive and universal system for large-scale graph neural network training

M Zhang, Q Hu, C Wan, H Wang, P Sun… - 2024 IEEE 40th …, 2024 - ieeexplore.ieee.org
Distributed full-graph training of Graph Neural Networks (GNNs) has been widely adopted to
learn large-scale graphs. While recent system advancements can improve the training …

Osdp: Optimal sharded data parallel for distributed deep learning

Y Jiang, F Fu, X Miao, X Nie, B Cui - arXiv preprint arXiv:2209.13258, 2022 - arxiv.org
Large-scale deep learning models contribute to significant performance improvements on
varieties of downstream tasks. Current data and model parallelism approaches utilize model …

Efficient Multi-Task Large Model Training via Data Heterogeneity-aware Model Management

Y Wang, S Zhu, F Fu, X Miao, J Zhang, J Zhu… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent foundation models are capable of handling multiple machine learning (ML) tasks
and multiple data modalities with the unified base model structure and several specialized …

FlashFlex: Accommodating Large Language Model Training over Heterogeneous Environment

R Yan, Y Jiang, W Tao, X Nie, B Cui, B Yuan - arXiv preprint arXiv …, 2024 - arxiv.org
Training large language model (LLM) is a computationally intensive task, which is typically
conducted in data centers with homogeneous high-performance GPUs. This paper explores …

Slice-Level Scheduling for High Throughput and Load Balanced LLM Serving

K Cheng, W Hu, Z Wang, H Peng, J Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) iteratively generate text token by token, with memory usage
increasing with the length of generated token sequences. The unpredictability of generation …

Demystifying Data Management for Large Language Models

X Miao, Z Jia, B Cui - Companion of the 2024 International Conference …, 2024 - dl.acm.org
Navigating the intricacies of data management in the era of Large Language Models (LLMs)
presents both challenges and opportunities for database and data management …

Cephalo: Harnessing Heterogeneous GPU Clusters for Training Transformer Models

RB Guo, U Anand, A Chen, K Daudjee - arXiv preprint arXiv:2411.01075, 2024 - arxiv.org
Training transformer models requires substantial GPU compute and memory resources. In
homogeneous clusters, distributed strategies allocate resources evenly, but this approach is …