Parameter-efficient fine-tuning for large models: A comprehensive survey

Z Han, C Gao, J Liu, J Zhang, SQ Zhang - arXiv preprint arXiv:2403.14608, 2024 - arxiv.org
Large models represent a groundbreaking advancement in multiple application fields,
enabling remarkable achievements across various tasks. However, their unprecedented …

Deep learning workload scheduling in gpu datacenters: A survey

Z Ye, W Gao, Q Hu, P Sun, X Wang, Y Luo… - ACM Computing …, 2024 - dl.acm.org
Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The
development of a DL model is a time-consuming and resource-intensive procedure. Hence …

Powerinfer: Fast large language model serving with a consumer-grade gpu

Y Song, Z Mi, H Xie, H Chen - Proceedings of the ACM SIGOPS 30th …, 2024 - dl.acm.org
This paper introduces PowerInfer, a high-speed Large Language Model (LLM) inference
engine on a personal computer (PC) equipped with a single consumer-grade GPU. The key …

Fairness in serving large language models

Y Sheng, S Cao, D Li, B Zhu, Z Li, D Zhuo… - … USENIX Symposium on …, 2024 - usenix.org
High-demand LLM inference services (eg, ChatGPT and BARD) support a wide range of
requests from short chat conversations to long document reading. To ensure that all client …

Towards efficient generative large language model serving: A survey from algorithms to systems

X Miao, G Oliaro, Z Zhang, X Cheng, H Jin… - arXiv preprint arXiv …, 2023 - arxiv.org
In the rapidly evolving landscape of artificial intelligence (AI), generative large language
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …

LoRAMoE: Alleviating world knowledge forgetting in large language models via MoE-style plugin

S Dou, E Zhou, Y Liu, S Gao, W Shen… - Proceedings of the …, 2024 - aclanthology.org
Supervised fine-tuning (SFT) is a crucial step for large language models (LLMs), enabling
them to align with human instructions and enhance their capabilities in downstream tasks …

A survey of resource-efficient llm and multimodal foundation models

M Xu, W Yin, D Cai, R Yi, D Xu, Q Wang, B Wu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …

Splitlora: A split parameter-efficient fine-tuning framework for large language models

Z Lin, X Hu, Y Zhang, Z Chen, Z Fang, X Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
The scalability of large language models (LLMs) in handling high-complexity models and
large-scale datasets has led to tremendous successes in pivotal domains. While there is an …

ServerlessLLM: Low-latency serverless inference for large language models

Y Fu, L Xue, Y Huang, AO Brabete… - … Systems Design and …, 2024 - research.ed.ac.uk
This paper presents ServerlessLLM, a distributed system designed to support low-latency
serverless inference for Large Language Models (LLMs). By harnessing the substantial near …

Galore: Memory-efficient llm training by gradient low-rank projection

J Zhao, Z Zhang, B Chen, Z Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Training Large Language Models (LLMs) presents significant memory challenges,
predominantly due to the growing size of weights and optimizer states. Common memory …