Flexgen: High-throughput generative inference of large language models with a single gpu
The high computational and memory requirements of large language model (LLM) inference
make it feasible only with multiple high-end accelerators. Motivated by the emerging …
make it feasible only with multiple high-end accelerators. Motivated by the emerging …
A survey on scheduling techniques in computing and network convergence
S Tang, Y Yu, H Wang, G Wang, W Chen… - … Surveys & Tutorials, 2023 - ieeexplore.ieee.org
The computing demand for massive applications has led to the ubiquitous deployment of
computing power. This trend results in the urgent need for higher-level computing resource …
computing power. This trend results in the urgent need for higher-level computing resource …
LLM-Based Edge Intelligence: A Comprehensive Survey on Architectures, Applications, Security and Trustworthiness
The integration of Large Language Models (LLMs) and Edge Intelligence (EI) introduces a
groundbreaking paradigm for intelligent edge devices. With their capacity for human-like …
groundbreaking paradigm for intelligent edge devices. With their capacity for human-like …
Petals: Collaborative inference and fine-tuning of large models
Many NLP tasks benefit from using large language models (LLMs) that often have more than
100 billion parameters. With the release of BLOOM-176B and OPT-175B, everyone can …
100 billion parameters. With the release of BLOOM-176B and OPT-175B, everyone can …
Fusionai: Decentralized training and deploying llms with massive consumer-level gpus
The rapid growth of memory and computation requirements of large language models
(LLMs) has outpaced the development of hardware, hindering people who lack large-scale …
(LLMs) has outpaced the development of hardware, hindering people who lack large-scale …
Hexgen: Generative inference of foundation model over heterogeneous decentralized environment
Serving foundation model inference is a pivotal component of contemporary AI applications,
where this service is usually hosted in a centralized data center on a group of homogeneous …
where this service is usually hosted in a centralized data center on a group of homogeneous …
Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey
With the rapid growth in the volume of data sets, models, and devices in the domain of deep
learning, there is increasing attention on large-scale distributed deep learning. In contrast to …
learning, there is increasing attention on large-scale distributed deep learning. In contrast to …
Exploring the robustness of decentralized training for large language models
Decentralized training of large language models has emerged as an effective way to
democratize this technology. However, the potential threats associated with this approach …
democratize this technology. However, the potential threats associated with this approach …
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
Large Language Models (LLMs) like GPT and LLaMA are revolutionizing the AI industry with
their sophisticated capabilities. Training these models requires vast GPU clusters and …
their sophisticated capabilities. Training these models requires vast GPU clusters and …
Resource Allocation and Workload Scheduling for Large-Scale Distributed Deep Learning: A Survey
With rapidly increasing distributed deep learning workloads in large-scale data centers,
efficient distributed deep learning framework strategies for resource allocation and workload …
efficient distributed deep learning framework strategies for resource allocation and workload …