Effective long-context scaling of foundation models

Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents

Z Zhang, Y Yao, A Zhang, X Tang, X Ma, Z He… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LLMs) have dramatically enhanced the field of language
intelligence, as demonstrably evidenced by their formidable empirical performance across a …

被引用次数：21 相关文章所有 2 个版本

[PDF] arxiv.org

A survey of large language models

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arXiv preprint arXiv …, 2023 - arxiv.org

Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

被引用次数：2022 相关文章所有 4 个版本

[PDF] arxiv.org

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

M Reid, N Savinov, D Teplyashin, D Lepikhin… - arXiv preprint arXiv …, 2024 - arxiv.org

In this report, we present the latest model of the Gemini family, Gemini 1.5 Pro, a highly
compute-efficient multimodal mixture-of-experts model capable of recalling and reasoning …

被引用次数：216 相关文章所有 4 个版本

[PDF] arxiv.org

Yi: Open foundation models by 01. ai

A Young, B Chen, C Li, C Huang, G Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

We introduce the Yi model family, a series of language and multimodal models that
demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and …

被引用次数：113 相关文章所有 2 个版本

[PDF] openreview.net

The unlocking spell on base llms: Rethinking alignment via in-context learning

BY Lin, A Ravichander, X Lu, N Dziri… - The Twelfth …, 2023 - openreview.net

Alignment tuning has become the de facto standard practice for enabling base large
language models (LLMs) to serve as open-domain AI assistants. The alignment tuning …

被引用次数：65 相关文章所有 3 个版本

[PDF] arxiv.org

Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression

H Jiang, Q Wu, X Luo, D Li, CY Lin, Y Yang… - arXiv preprint arXiv …, 2023 - arxiv.org

In long context scenarios, large language models (LLMs) face three main challenges: higher
computational/financial cost, longer latency, and inferior performance. Some studies reveal …

被引用次数：58 相关文章所有 4 个版本

[PDF] openreview.net

Llm maybe longlm: Self-extend llm context window without tuning

H Jin, X Han, J Yang, Z Jiang, Z Liu, CY Chang… - arXiv preprint arXiv …, 2024 - arxiv.org

This work elicits LLMs' inherent ability to handle long contexts without fine-tuning. The
limited length of the training sequence during training may limit the application of Large …

被引用次数：31 相关文章所有 6 个版本

[PDF] arxiv.org

Chatgpt's one-year anniversary: are open-source large language models catching up?

H Chen, F Jiao, X Li, C Qin, M Ravaut, R Zhao… - arXiv preprint arXiv …, 2023 - arxiv.org

Upon its release in late 2022, ChatGPT has brought a seismic shift in the entire landscape of
AI, both in research and commerce. Through instruction-tuning a large language model …

被引用次数：26 相关文章所有 4 个版本

[PDF] arxiv.org

Minicpm: Unveiling the potential of small language models with scalable training strategies

S Hu, Y Tu, X Han, C He, G Cui, X Long… - arXiv preprint arXiv …, 2024 - arxiv.org

The burgeoning interest in developing Large Language Models (LLMs) with up to trillion
parameters has been met with concerns regarding resource efficiency and practical …

被引用次数：36 相关文章所有 2 个版本

[PDF] arxiv.org

Data engineering for scaling language models to 128k context

Y Fu, R Panda, X Niu, X Yue, H Hajishirzi, Y Kim… - arXiv preprint arXiv …, 2024 - arxiv.org

We study the continual pretraining recipe for scaling language models' context lengths to
128K, with a focus on data engineering. We hypothesize that long context modeling, in …

被引用次数：25 相关文章所有 4 个版本