Advancing transformer architecture in long-context large language models: A comprehensive survey

Y Huang, J Xu, J Lai, Z Jiang, T Chen, Z Li… - arXiv preprint arXiv …, 2023 - arxiv.org
With the bomb ignited by ChatGPT, Transformer-based Large Language Models (LLMs)
have paved a revolutionary path toward Artificial General Intelligence (AGI) and have been …

Harnessing the power of llms in practice: A survey on chatgpt and beyond

J Yang, H Jin, R Tang, X Han, Q Feng, H Jiang… - ACM Transactions on …, 2024 - dl.acm.org
This article presents a comprehensive and practical guide for practitioners and end-users
working with Large Language Models (LLMs) in their downstream Natural Language …

Longrag: Enhancing retrieval-augmented generation with long-context llms

Z Jiang, X Ma, W Chen - arXiv preprint arXiv:2406.15319, 2024 - arxiv.org
In traditional RAG framework, the basic retrieval units are normally short. The common
retrievers like DPR normally work with 100-word Wikipedia paragraphs. Such a design …

Tensor attention training: Provably efficient learning of higher-order transformers

Y Liang, Z Shi, Z Song, Y Zhou - arXiv preprint arXiv:2405.16411, 2024 - arxiv.org
Tensor Attention, a multi-view attention that is able to capture high-order correlations among
multiple modalities, can overcome the representational limitations of classical matrix …

Trends and challenges of real-time learning in large language models: A critical review

M Jovanovic, P Voss - arXiv preprint arXiv:2404.18311, 2024 - arxiv.org
Real-time learning concerns the ability of learning systems to acquire knowledge over time,
enabling their adaptation and generalization to novel tasks. It is a critical ability for …

Conv-basis: A new paradigm for efficient attention inference and gradient computation in transformers

Y Liang, H Liu, Z Shi, Z Song, Z Xu, J Yin - arXiv preprint arXiv:2405.05219, 2024 - arxiv.org
The self-attention mechanism is the key to the success of transformers in recent Large
Language Models (LLMs). However, the quadratic computational cost $ O (n^ 2) $ in the …

Large language models in biomedical natural language processing: benchmarks, baselines, and recommendations

Q Chen, J Du, Y Hu, V Kuttichi Keloth, X Peng… - arXiv e …, 2023 - ui.adsabs.harvard.edu
Biomedical literature is growing rapidly, making it challenging to curate and extract
knowledge manually. Biomedical natural language processing (BioNLP) techniques that …

Imagefolder: Autoregressive image generation with folded tokens

X Li, K Qiu, H Chen, J Kuen, J Gu, B Raj… - arXiv preprint arXiv …, 2024 - arxiv.org
Image tokenizers are crucial for visual generative models, eg, diffusion models (DMs) and
autoregressive (AR) models, as they construct the latent representation for modeling …

How to train long-context language models (effectively)

T Gao, A Wettig, H Yen, D Chen - arXiv preprint arXiv:2410.02660, 2024 - arxiv.org
We study continued training and supervised fine-tuning (SFT) of a language model (LM) to
make effective use of long-context information. We first establish a reliable evaluation …

Longwriter: Unleashing 10,000+ word generation from long context llms

Y Bai, J Zhang, X Lv, L Zheng, S Zhu, L Hou… - arXiv preprint arXiv …, 2024 - arxiv.org
Current long context large language models (LLMs) can process inputs up to 100,000
tokens, yet struggle to generate outputs exceeding even a modest length of 2,000 words …