Eagle-2: Faster inference of language models with dynamic draft trees
Inference with modern Large Language Models (LLMs) is expensive and time-consuming,
and speculative sampling has proven to be an effective solution. Most speculative sampling …
and speculative sampling has proven to be an effective solution. Most speculative sampling …
From Natural Language to SQL: Review of LLM-based Text-to-SQL Systems
A Mohammadjafari, AS Maida… - arXiv preprint arXiv …, 2024 - arxiv.org
Since the onset of LLMs, translating natural language queries to structured SQL commands
is assuming increasing. Unlike the previous reviews, this survey provides a comprehensive …
is assuming increasing. Unlike the previous reviews, this survey provides a comprehensive …
Accelerating auto-regressive text-to-image generation with training-free speculative jacobi decoding
The current large auto-regressive models can generate high-quality, high-resolution images,
but these models require hundreds or even thousands of steps of next-token prediction …
but these models require hundreds or even thousands of steps of next-token prediction …
Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL
Generating accurate SQL according to natural language questions (text-to-SQL) is a long-
standing problem since it is challenging in user question understanding, database schema …
standing problem since it is challenging in user question understanding, database schema …
ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality
In this paper, we propose ZipAR, a training-free, plug-and-play parallel decoding framework
for accelerating auto-regressive (AR) visual generation. The motivation stems from the …
for accelerating auto-regressive (AR) visual generation. The motivation stems from the …
Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling
The rapid growth in the parameters of large language models (LLMs) has made inference
latency a fundamental bottleneck, limiting broader application of LLMs. Speculative …
latency a fundamental bottleneck, limiting broader application of LLMs. Speculative …
Closed-loop long-horizon robotic planning via equilibrium sequence modeling
In the endeavor to make autonomous robots take actions, task planning is a major challenge
that requires translating high-level task descriptions into long-horizon action sequences …
that requires translating high-level task descriptions into long-horizon action sequences …
Governing Open Vocabulary Data Leaks Using an Edge LLM through Programming by Example
A major concern with integrating large language model (LLM) services (eg, ChatGPT) into
workplaces is that employees may inadvertently leak sensitive information through their …
workplaces is that employees may inadvertently leak sensitive information through their …
Remote Timing Attacks on Efficient Language Model Inference
Scaling up language models has significantly increased their capabilities. But larger models
are slower models, and so there is now an extensive body of work (eg, speculative sampling …
are slower models, and so there is now an extensive body of work (eg, speculative sampling …
Parallelized Autoregressive Visual Generation
Autoregressive models have emerged as a powerful approach for visual generation but
suffer from slow inference speed due to their sequential token-by-token prediction process …
suffer from slow inference speed due to their sequential token-by-token prediction process …