Parameter-efficient fine-tuning for large models: A comprehensive survey
Large models represent a groundbreaking advancement in multiple application fields,
enabling remarkable achievements across various tasks. However, their unprecedented …
enabling remarkable achievements across various tasks. However, their unprecedented …
Deep learning workload scheduling in gpu datacenters: A survey
Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The
development of a DL model is a time-consuming and resource-intensive procedure. Hence …
development of a DL model is a time-consuming and resource-intensive procedure. Hence …
Powerinfer: Fast large language model serving with a consumer-grade gpu
This paper introduces PowerInfer, a high-speed Large Language Model (LLM) inference
engine on a personal computer (PC) equipped with a single consumer-grade GPU. The key …
engine on a personal computer (PC) equipped with a single consumer-grade GPU. The key …
Fairness in serving large language models
High-demand LLM inference services (eg, ChatGPT and BARD) support a wide range of
requests from short chat conversations to long document reading. To ensure that all client …
requests from short chat conversations to long document reading. To ensure that all client …
Towards efficient generative large language model serving: A survey from algorithms to systems
In the rapidly evolving landscape of artificial intelligence (AI), generative large language
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …
LoRAMoE: Alleviating world knowledge forgetting in large language models via MoE-style plugin
Supervised fine-tuning (SFT) is a crucial step for large language models (LLMs), enabling
them to align with human instructions and enhance their capabilities in downstream tasks …
them to align with human instructions and enhance their capabilities in downstream tasks …
A survey of resource-efficient llm and multimodal foundation models
Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …
Splitlora: A split parameter-efficient fine-tuning framework for large language models
The scalability of large language models (LLMs) in handling high-complexity models and
large-scale datasets has led to tremendous successes in pivotal domains. While there is an …
large-scale datasets has led to tremendous successes in pivotal domains. While there is an …
ServerlessLLM: Low-latency serverless inference for large language models
This paper presents ServerlessLLM, a distributed system designed to support low-latency
serverless inference for Large Language Models (LLMs). By harnessing the substantial near …
serverless inference for Large Language Models (LLMs). By harnessing the substantial near …
Galore: Memory-efficient llm training by gradient low-rank projection
Training Large Language Models (LLMs) presents significant memory challenges,
predominantly due to the growing size of weights and optimizer states. Common memory …
predominantly due to the growing size of weights and optimizer states. Common memory …