Halos: Hashing large output space for cheap inference

文章

学术资源搜索

获得 1 条结果（用时0.02秒）

我的图书馆

Halos: Hashing large output space for cheap inference

在引用文章中搜索

[PDF] mlr.press

Deja vu: Contextual sparsity for efficient llms at inference time

Z Liu, J Wang, T Dao, T Zhou, B Yuan… - International …, 2023 - proceedings.mlr.press

Large language models (LLMs) with hundreds of billions of parameters have sparked a new
wave of exciting AI applications. However, they are computationally expensive at inference …

被引用次数：206 相关文章所有 7 个版本