Advancing transformer architecture in long-context large language models: A comprehensive survey
With the bomb ignited by ChatGPT, Transformer-based Large Language Models (LLMs)
have paved a revolutionary path toward Artificial General Intelligence (AGI) and have been …
have paved a revolutionary path toward Artificial General Intelligence (AGI) and have been …
Harnessing the power of llms in practice: A survey on chatgpt and beyond
This article presents a comprehensive and practical guide for practitioners and end-users
working with Large Language Models (LLMs) in their downstream Natural Language …
working with Large Language Models (LLMs) in their downstream Natural Language …
Longrag: Enhancing retrieval-augmented generation with long-context llms
In traditional RAG framework, the basic retrieval units are normally short. The common
retrievers like DPR normally work with 100-word Wikipedia paragraphs. Such a design …
retrievers like DPR normally work with 100-word Wikipedia paragraphs. Such a design …
Tensor attention training: Provably efficient learning of higher-order transformers
Tensor Attention, a multi-view attention that is able to capture high-order correlations among
multiple modalities, can overcome the representational limitations of classical matrix …
multiple modalities, can overcome the representational limitations of classical matrix …
Trends and challenges of real-time learning in large language models: A critical review
M Jovanovic, P Voss - arXiv preprint arXiv:2404.18311, 2024 - arxiv.org
Real-time learning concerns the ability of learning systems to acquire knowledge over time,
enabling their adaptation and generalization to novel tasks. It is a critical ability for …
enabling their adaptation and generalization to novel tasks. It is a critical ability for …
Conv-basis: A new paradigm for efficient attention inference and gradient computation in transformers
The self-attention mechanism is the key to the success of transformers in recent Large
Language Models (LLMs). However, the quadratic computational cost $ O (n^ 2) $ in the …
Language Models (LLMs). However, the quadratic computational cost $ O (n^ 2) $ in the …
Large language models in biomedical natural language processing: benchmarks, baselines, and recommendations
Biomedical literature is growing rapidly, making it challenging to curate and extract
knowledge manually. Biomedical natural language processing (BioNLP) techniques that …
knowledge manually. Biomedical natural language processing (BioNLP) techniques that …
Imagefolder: Autoregressive image generation with folded tokens
Image tokenizers are crucial for visual generative models, eg, diffusion models (DMs) and
autoregressive (AR) models, as they construct the latent representation for modeling …
autoregressive (AR) models, as they construct the latent representation for modeling …
How to train long-context language models (effectively)
We study continued training and supervised fine-tuning (SFT) of a language model (LM) to
make effective use of long-context information. We first establish a reliable evaluation …
make effective use of long-context information. We first establish a reliable evaluation …
Longwriter: Unleashing 10,000+ word generation from long context llms
Current long context large language models (LLMs) can process inputs up to 100,000
tokens, yet struggle to generate outputs exceeding even a modest length of 2,000 words …
tokens, yet struggle to generate outputs exceeding even a modest length of 2,000 words …