The NLP task effectiveness of long-range transformers

NF Liu, K Lin, J Hewitt, A Paranjape… - Transactions of the …, 2024 - direct.mit.edu

While recent language models have the ability to take long contexts as input, relatively little
is known about how well they use longer context. We analyze the performance of language …

被引用次数：1008 相关文章所有 11 个版本

[PDF] arxiv.org

Ul2: Unifying language learning paradigms

Y Tay, M Dehghani, VQ Tran, X Garcia, J Wei… - arXiv preprint arXiv …, 2022 - arxiv.org

Existing pre-trained models are generally geared towards a particular class of problems. To
date, there seems to be still no consensus on what the right architecture and pre-training …

被引用次数：285 相关文章所有 5 个版本

[PDF] arxiv.org

Personality traits in large language models

M Safdari, G Serapio-García, C Crepy, S Fitz… - arXiv preprint arXiv …, 2023 - arxiv.org

The advent of large language models (LLMs) has revolutionized natural language
processing, enabling the generation of coherent and contextually relevant text. As LLMs …

被引用次数：163 相关文章所有 5 个版本

[PDF] mlr.press

Nugget: Neural agglomerative embeddings of text

G Qin, B Van Durme - International Conference on Machine …, 2023 - proceedings.mlr.press

Embedding text sequences is a widespread requirement in modern language
understanding. Existing approaches focus largely on constant-size representations. This is …

被引用次数：13 相关文章所有 6 个版本

[PDF] arxiv.org

Marg: Multi-agent review generation for scientific papers

M D'Arcy, T Hope, L Birnbaum, D Downey - arXiv preprint arXiv …, 2024 - arxiv.org

We study the ability of LLMs to generate feedback for scientific papers and develop MARG, a
feedback generation approach using multiple LLM instances that engage in internal …

被引用次数：21 相关文章所有 2 个版本

[PDF] aclanthology.org

Selective Perception: Learning Concise State Descriptions for Language Model Actors

K Nottingham, Y Razeghi, K Kim… - Proceedings of the …, 2024 - aclanthology.org

The latest large language models (LMs) support increasingly longer contexts. While this
trend permits using substantial amounts of text with SOTA LMs, requiring these large LMs to …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Length-Aware Multi-Kernel Transformer for Long Document Classification

G Han, J Tsao, X Huang - arXiv preprint arXiv:2405.07052, 2024 - arxiv.org

Lengthy documents pose a unique challenge to neural language models due to substantial
memory consumption. While existing state-of-the-art (SOTA) models segment long texts into …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

Multilingual needle in a haystack: Investigating long-context behavior of multilingual large language models

A Hengle, P Bajpai, S Dan, T Chakraborty - arXiv preprint arXiv …, 2024 - arxiv.org

While recent large language models (LLMs) demonstrate remarkable abilities in responding
to queries in diverse languages, their ability to handle long multilingual contexts is …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

CLERC: A Dataset for Legal Case Retrieval and Retrieval-Augmented Analysis Generation

AB Hou, O Weller, G Qin, E Yang, D Lawrie… - arXiv preprint arXiv …, 2024 - arxiv.org

Legal professionals need to write analyses that rely on citations to relevant precedents, ie,
previous case decisions. Intelligent systems assisting legal professionals in writing such …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Attention instruction: Amplifying attention in the middle via prompting

M Zhang, Z Meng, N Collier - arXiv preprint arXiv:2406.17095, 2024 - arxiv.org

The context window of large language models has been extended to 128k tokens or more.
However, language models still suffer from position bias and have difficulty in accessing and …

被引用次数：2 相关文章所有 2 个版本