A survey of resource-efficient llm and multimodal foundation models

M Xu, W Yin, D Cai, R Yi, D Xu, Q Wang, B Wu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …

Alphazero-like tree-search can guide large language model decoding and training

Z Wan, X Feng, M Wen, SM McAleer, Y Wen… - … on Machine Learning, 2024 - openreview.net
Recent works like Tree-of-Thought (ToT) and Reasoning via Planning (RAP) aim to augment
the multi-step reasoning capabilities of LLMs by using tree-search algorithms. These …

A survey on efficient inference for large language models

Z Zhou, X Ning, K Hong, T Fu, J Xu, S Li, Y Lou… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have attracted extensive attention due to their remarkable
performance across various tasks. However, the substantial computational and memory …

{Cost-Efficient} Large Language Model Serving for Multi-turn Conversations with {CachedAttention}

B Gao, Z He, P Sharma, Q Kang, D Jevdjic… - 2024 USENIX Annual …, 2024 - usenix.org
Interacting with humans through multi-turn conversations is a fundamental feature of large
language models (LLMs). However, existing LLM serving engines executing multi-turn …

RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation

C Jin, Z Zhang, X Jiang, F Liu, X Liu, X Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
Retrieval-Augmented Generation (RAG) has shown significant improvements in various
natural language processing tasks by integrating the strengths of large language models …

Llm as a system service on mobile devices

W Yin, M Xu, Y Li, X Liu - arXiv preprint arXiv:2403.11805, 2024 - arxiv.org
Being more powerful and intrusive into user-device interactions, LLMs are eager for on-
device execution to better preserve user privacy. In this work, we propose a new paradigm of …

Optimizing llm queries in relational workloads

S Liu, A Biswal, A Cheng, X Mo, S Cao… - arXiv preprint arXiv …, 2024 - arxiv.org
Analytical database providers (eg, Redshift, Databricks, BigQuery) have rapidly added
support for invoking Large Language Models (LLMs) through native user-defined functions …

Rethinking software engineering in the era of foundation models: A curated catalogue of challenges in the development of trustworthy fmware

AE Hassan, D Lin, GK Rajbahadur, K Gallaba… - … Proceedings of the …, 2024 - dl.acm.org
Foundation models (FMs), such as Large Language Models (LLMs), have revolutionized
software development by enabling new use cases and business models. We refer to …

Octopus: On-device language model for function calling of software APIs

W Chen, Z Li, M Ma - arXiv preprint arXiv:2404.01549, 2024 - arxiv.org
In the rapidly evolving domain of artificial intelligence, Large Language Models (LLMs) play
a crucial role due to their advanced text processing and generation abilities. This study …

ALTO: An Efficient Network Orchestrator for Compound AI Systems

K Santhanam, D Raghavan, MS Rahman… - Proceedings of the 4th …, 2024 - dl.acm.org
We present ALTO, a network orchestrator for efficiently serving compound AI systems such
as pipelines of language models. ALTO leverages an optimization opportunity specific to …