A survey of resource-efficient llm and multimodal foundation models
Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …
Alphazero-like tree-search can guide large language model decoding and training
Recent works like Tree-of-Thought (ToT) and Reasoning via Planning (RAP) aim to augment
the multi-step reasoning capabilities of LLMs by using tree-search algorithms. These …
the multi-step reasoning capabilities of LLMs by using tree-search algorithms. These …
A survey on efficient inference for large language models
Large Language Models (LLMs) have attracted extensive attention due to their remarkable
performance across various tasks. However, the substantial computational and memory …
performance across various tasks. However, the substantial computational and memory …
{Cost-Efficient} Large Language Model Serving for Multi-turn Conversations with {CachedAttention}
Interacting with humans through multi-turn conversations is a fundamental feature of large
language models (LLMs). However, existing LLM serving engines executing multi-turn …
language models (LLMs). However, existing LLM serving engines executing multi-turn …
RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation
Retrieval-Augmented Generation (RAG) has shown significant improvements in various
natural language processing tasks by integrating the strengths of large language models …
natural language processing tasks by integrating the strengths of large language models …
Llm as a system service on mobile devices
Being more powerful and intrusive into user-device interactions, LLMs are eager for on-
device execution to better preserve user privacy. In this work, we propose a new paradigm of …
device execution to better preserve user privacy. In this work, we propose a new paradigm of …
Optimizing llm queries in relational workloads
Analytical database providers (eg, Redshift, Databricks, BigQuery) have rapidly added
support for invoking Large Language Models (LLMs) through native user-defined functions …
support for invoking Large Language Models (LLMs) through native user-defined functions …
Rethinking software engineering in the era of foundation models: A curated catalogue of challenges in the development of trustworthy fmware
Foundation models (FMs), such as Large Language Models (LLMs), have revolutionized
software development by enabling new use cases and business models. We refer to …
software development by enabling new use cases and business models. We refer to …
Octopus: On-device language model for function calling of software APIs
In the rapidly evolving domain of artificial intelligence, Large Language Models (LLMs) play
a crucial role due to their advanced text processing and generation abilities. This study …
a crucial role due to their advanced text processing and generation abilities. This study …
ALTO: An Efficient Network Orchestrator for Compound AI Systems
K Santhanam, D Raghavan, MS Rahman… - Proceedings of the 4th …, 2024 - dl.acm.org
We present ALTO, a network orchestrator for efficiently serving compound AI systems such
as pipelines of language models. ALTO leverages an optimization opportunity specific to …
as pipelines of language models. ALTO leverages an optimization opportunity specific to …