Eagle-2: Faster inference of language models with dynamic draft trees

Y Li, F Wei, C Zhang, H Zhang - arXiv preprint arXiv:2406.16858, 2024 - arxiv.org
Inference with modern Large Language Models (LLMs) is expensive and time-consuming,
and speculative sampling has proven to be an effective solution. Most speculative sampling …

From Natural Language to SQL: Review of LLM-based Text-to-SQL Systems

A Mohammadjafari, AS Maida… - arXiv preprint arXiv …, 2024 - arxiv.org
Since the onset of LLMs, translating natural language queries to structured SQL commands
is assuming increasing. Unlike the previous reviews, this survey provides a comprehensive …

Accelerating auto-regressive text-to-image generation with training-free speculative jacobi decoding

Y Teng, H Shi, X Liu, X Ning, G Dai, Y Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
The current large auto-regressive models can generate high-quality, high-resolution images,
but these models require hundreds or even thousands of steps of next-token prediction …

Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL

Z Hong, Z Yuan, Q Zhang, H Chen, J Dong… - arXiv preprint arXiv …, 2024 - arxiv.org
Generating accurate SQL according to natural language questions (text-to-SQL) is a long-
standing problem since it is challenging in user question understanding, database schema …

ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality

Y He, F Chen, Y He, S He, H Zhou, K Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
In this paper, we propose ZipAR, a training-free, plug-and-play parallel decoding framework
for accelerating auto-regressive (AR) visual generation. The motivation stems from the …

Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling

X Luo, Y Wang, Q Zhu, Z Zhang, X Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid growth in the parameters of large language models (LLMs) has made inference
latency a fundamental bottleneck, limiting broader application of LLMs. Speculative …

Closed-loop long-horizon robotic planning via equilibrium sequence modeling

J Li, Z Sun, F Li, C Sheng, J Yu, Y Mu - arXiv preprint arXiv:2410.01440, 2024 - arxiv.org
In the endeavor to make autonomous robots take actions, task planning is a major challenge
that requires translating high-level task descriptions into long-horizon action sequences …

Governing Open Vocabulary Data Leaks Using an Edge LLM through Programming by Example

Q Li, J Wen, H Jin - Proceedings of the ACM on Interactive, Mobile …, 2024 - dl.acm.org
A major concern with integrating large language model (LLM) services (eg, ChatGPT) into
workplaces is that employees may inadvertently leak sensitive information through their …

Remote Timing Attacks on Efficient Language Model Inference

N Carlini, M Nasr - arXiv preprint arXiv:2410.17175, 2024 - arxiv.org
Scaling up language models has significantly increased their capabilities. But larger models
are slower models, and so there is now an extensive body of work (eg, speculative sampling …

Parallelized Autoregressive Visual Generation

Y Wang, S Ren, Z Lin, Y Han, H Guo, Z Yang… - arXiv preprint arXiv …, 2024 - arxiv.org
Autoregressive models have emerged as a powerful approach for visual generation but
suffer from slow inference speed due to their sequential token-by-token prediction process …