Big bird: Transformers for longer sequences

Z Ji, N Lee, R Frieske, T Yu, D Su, Y Xu, E Ishii… - ACM Computing …, 2023 - dl.acm.org

Natural Language Generation (NLG) has improved exponentially in recent years thanks to
the development of sequence-to-sequence deep learning technologies such as Transformer …

被引用次数：2287 相关文章所有 7 个版本

[PDF] nature.com

Current progress and open challenges for applying deep learning across the biosciences

N Sapoval, A Aghazadeh, MG Nute… - Nature …, 2022 - nature.com

Deep Learning (DL) has recently enabled unprecedented advances in one of the grand
challenges in computational biology: the half-century-old problem of protein structure …

被引用次数：199 相关文章所有 16 个版本

[PDF] arxiv.org

A survey of large language models

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arXiv preprint arXiv …, 2023 - arxiv.org

Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

被引用次数：2246 相关文章所有 4 个版本

[PDF] mit.edu

Lost in the middle: How language models use long contexts

NF Liu, K Lin, J Hewitt, A Paranjape… - Transactions of the …, 2024 - direct.mit.edu

While recent language models have the ability to take long contexts as input, relatively little
is known about how well they use longer context. We analyze the performance of language …

被引用次数：643 相关文章所有 11 个版本

[PDF] royalsocietypublishing.org

Gpt-4 passes the bar exam

DM Katz, MJ Bommarito, S Gao… - … Transactions of the …, 2024 - royalsocietypublishing.org

In this paper, we experimentally evaluate the zero-shot performance of GPT-4 against prior
generations of GPT on the entire uniform bar examination (UBE), including not only the …

被引用次数：327 相关文章所有 11 个版本

[PDF] github.io

The rise and potential of large language model based agents: A survey

Z Xi, W Chen, X Guo, W He, Y Ding, B Hong… - arXiv preprint arXiv …, 2023 - arxiv.org

For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing
the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are …

被引用次数：432 相关文章所有 4 个版本

[PDF] neurips.cc

Flashattention: Fast and memory-efficient exact attention with io-awareness

T Dao, D Fu, S Ermon, A Rudra… - Advances in Neural …, 2022 - proceedings.neurips.cc

Transformers are slow and memory-hungry on long sequences, since the time and memory
complexity of self-attention are quadratic in sequence length. Approximate attention …

被引用次数：1121 相关文章所有 10 个版本

[PDF] arxiv.org

Rwkv: Reinventing rnns for the transformer era

B Peng, E Alcaide, Q Anthony, A Albalak… - arXiv preprint arXiv …, 2023 - arxiv.org

Transformers have revolutionized almost all natural language processing (NLP) tasks but
suffer from memory and computational complexity that scales quadratically with sequence …

被引用次数：275 相关文章所有 9 个版本

[PDF] neurips.cc

Hyenadna: Long-range genomic sequence modeling at single nucleotide resolution

E Nguyen, M Poli, M Faizi, A Thomas… - Advances in neural …, 2024 - proceedings.neurips.cc

Genomic (DNA) sequences encode an enormous amount of information for gene regulation
and protein synthesis. Similar to natural language models, researchers have proposed …

被引用次数：126 相关文章所有 8 个版本

[PDF] neurips.cc

Recipe for a general, powerful, scalable graph transformer

L Rampášek, M Galkin, VP Dwivedi… - Advances in …, 2022 - proceedings.neurips.cc

We propose a recipe on how to build a general, powerful, scalable (GPS) graph Transformer
with linear complexity and state-of-the-art results on a diverse set of benchmarks. Graph …

被引用次数：422 相关文章所有 6 个版本