Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents
Large language models (LLMs) have dramatically enhanced the field of language
intelligence, as demonstrably evidenced by their formidable empirical performance across a …
intelligence, as demonstrably evidenced by their formidable empirical performance across a …
A survey of large language models
Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
M Reid, N Savinov, D Teplyashin, D Lepikhin… - arXiv preprint arXiv …, 2024 - arxiv.org
In this report, we present the latest model of the Gemini family, Gemini 1.5 Pro, a highly
compute-efficient multimodal mixture-of-experts model capable of recalling and reasoning …
compute-efficient multimodal mixture-of-experts model capable of recalling and reasoning …
Yi: Open foundation models by 01. ai
We introduce the Yi model family, a series of language and multimodal models that
demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and …
demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and …
The unlocking spell on base llms: Rethinking alignment via in-context learning
Alignment tuning has become the de facto standard practice for enabling base large
language models (LLMs) to serve as open-domain AI assistants. The alignment tuning …
language models (LLMs) to serve as open-domain AI assistants. The alignment tuning …
Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression
In long context scenarios, large language models (LLMs) face three main challenges: higher
computational/financial cost, longer latency, and inferior performance. Some studies reveal …
computational/financial cost, longer latency, and inferior performance. Some studies reveal …
Llm maybe longlm: Self-extend llm context window without tuning
This work elicits LLMs' inherent ability to handle long contexts without fine-tuning. The
limited length of the training sequence during training may limit the application of Large …
limited length of the training sequence during training may limit the application of Large …
Chatgpt's one-year anniversary: are open-source large language models catching up?
Upon its release in late 2022, ChatGPT has brought a seismic shift in the entire landscape of
AI, both in research and commerce. Through instruction-tuning a large language model …
AI, both in research and commerce. Through instruction-tuning a large language model …
Minicpm: Unveiling the potential of small language models with scalable training strategies
The burgeoning interest in developing Large Language Models (LLMs) with up to trillion
parameters has been met with concerns regarding resource efficiency and practical …
parameters has been met with concerns regarding resource efficiency and practical …
Data engineering for scaling language models to 128k context
We study the continual pretraining recipe for scaling language models' context lengths to
128K, with a focus on data engineering. We hypothesize that long context modeling, in …
128K, with a focus on data engineering. We hypothesize that long context modeling, in …