Lost in the middle: How language models use long contexts

NF Liu, K Lin, J Hewitt, A Paranjape… - Transactions of the …, 2024 - direct.mit.edu
While recent language models have the ability to take long contexts as input, relatively little
is known about how well they use longer context. We analyze the performance of language …

Ul2: Unifying language learning paradigms

Y Tay, M Dehghani, VQ Tran, X Garcia, J Wei… - arXiv preprint arXiv …, 2022 - arxiv.org
Existing pre-trained models are generally geared towards a particular class of problems. To
date, there seems to be still no consensus on what the right architecture and pre-training …

Personality traits in large language models

M Safdari, G Serapio-García, C Crepy, S Fitz… - arXiv preprint arXiv …, 2023 - arxiv.org
The advent of large language models (LLMs) has revolutionized natural language
processing, enabling the generation of coherent and contextually relevant text. As LLMs …

Nugget: Neural agglomerative embeddings of text

G Qin, B Van Durme - International Conference on Machine …, 2023 - proceedings.mlr.press
Embedding text sequences is a widespread requirement in modern language
understanding. Existing approaches focus largely on constant-size representations. This is …

Marg: Multi-agent review generation for scientific papers

M D'Arcy, T Hope, L Birnbaum, D Downey - arXiv preprint arXiv …, 2024 - arxiv.org
We study the ability of LLMs to generate feedback for scientific papers and develop MARG, a
feedback generation approach using multiple LLM instances that engage in internal …

Selective Perception: Learning Concise State Descriptions for Language Model Actors

K Nottingham, Y Razeghi, K Kim… - Proceedings of the …, 2024 - aclanthology.org
The latest large language models (LMs) support increasingly longer contexts. While this
trend permits using substantial amounts of text with SOTA LMs, requiring these large LMs to …

Length-Aware Multi-Kernel Transformer for Long Document Classification

G Han, J Tsao, X Huang - arXiv preprint arXiv:2405.07052, 2024 - arxiv.org
Lengthy documents pose a unique challenge to neural language models due to substantial
memory consumption. While existing state-of-the-art (SOTA) models segment long texts into …

Multilingual needle in a haystack: Investigating long-context behavior of multilingual large language models

A Hengle, P Bajpai, S Dan, T Chakraborty - arXiv preprint arXiv …, 2024 - arxiv.org
While recent large language models (LLMs) demonstrate remarkable abilities in responding
to queries in diverse languages, their ability to handle long multilingual contexts is …

CLERC: A Dataset for Legal Case Retrieval and Retrieval-Augmented Analysis Generation

AB Hou, O Weller, G Qin, E Yang, D Lawrie… - arXiv preprint arXiv …, 2024 - arxiv.org
Legal professionals need to write analyses that rely on citations to relevant precedents, ie,
previous case decisions. Intelligent systems assisting legal professionals in writing such …

Attention instruction: Amplifying attention in the middle via prompting

M Zhang, Z Meng, N Collier - arXiv preprint arXiv:2406.17095, 2024 - arxiv.org
The context window of large language models has been extended to 128k tokens or more.
However, language models still suffer from position bias and have difficulty in accessing and …