Spotserve: Serving generative large language models on preemptible instances
The high computational and memory requirements of generative large language models
(LLMs) make it challenging to serve them cheaply. This paper aims to reduce the monetary …
(LLMs) make it challenging to serve them cheaply. This paper aims to reduce the monetary …
Metis: Fast Automatic Distributed Training on Heterogeneous {GPUs}
As deep learning model sizes expand and new GPUs are released every year, the need for
distributed training on heterogeneous GPUs rises to fully harness under-utilized low-end …
distributed training on heterogeneous GPUs rises to fully harness under-utilized low-end …
Efficient training of large language models on distributed infrastructures: A survey
Large Language Models (LLMs) like GPT and LLaMA are revolutionizing the AI industry with
their sophisticated capabilities. Training these models requires vast GPU clusters and …
their sophisticated capabilities. Training these models requires vast GPU clusters and …
Sylvie: 3d-adaptive and universal system for large-scale graph neural network training
Distributed full-graph training of Graph Neural Networks (GNNs) has been widely adopted to
learn large-scale graphs. While recent system advancements can improve the training …
learn large-scale graphs. While recent system advancements can improve the training …
Osdp: Optimal sharded data parallel for distributed deep learning
Large-scale deep learning models contribute to significant performance improvements on
varieties of downstream tasks. Current data and model parallelism approaches utilize model …
varieties of downstream tasks. Current data and model parallelism approaches utilize model …
Efficient Multi-Task Large Model Training via Data Heterogeneity-aware Model Management
Recent foundation models are capable of handling multiple machine learning (ML) tasks
and multiple data modalities with the unified base model structure and several specialized …
and multiple data modalities with the unified base model structure and several specialized …
FlashFlex: Accommodating Large Language Model Training over Heterogeneous Environment
Training large language model (LLM) is a computationally intensive task, which is typically
conducted in data centers with homogeneous high-performance GPUs. This paper explores …
conducted in data centers with homogeneous high-performance GPUs. This paper explores …
Slice-Level Scheduling for High Throughput and Load Balanced LLM Serving
Large language models (LLMs) iteratively generate text token by token, with memory usage
increasing with the length of generated token sequences. The unpredictability of generation …
increasing with the length of generated token sequences. The unpredictability of generation …
Demystifying Data Management for Large Language Models
Navigating the intricacies of data management in the era of Large Language Models (LLMs)
presents both challenges and opportunities for database and data management …
presents both challenges and opportunities for database and data management …
Cephalo: Harnessing Heterogeneous GPU Clusters for Training Transformer Models
RB Guo, U Anand, A Chen, K Daudjee - arXiv preprint arXiv:2411.01075, 2024 - arxiv.org
Training transformer models requires substantial GPU compute and memory resources. In
homogeneous clusters, distributed strategies allocate resources evenly, but this approach is …
homogeneous clusters, distributed strategies allocate resources evenly, but this approach is …