Block-recurrent transformers

L Yu, D Simig, C Flaherty… - Advances in …, 2024 - proceedings.neurips.cc

Autoregressive transformers are spectacular models for short sequences but scale poorly to
long sequences such as high-resolution images, podcasts, code, or books. We proposed …

被引用次数：61 相关文章所有 5 个版本

[PDF] arxiv.org

Transformers learn shortcuts to automata

B Liu, JT Ash, S Goel, A Krishnamurthy… - arXiv preprint arXiv …, 2022 - arxiv.org

Algorithmic reasoning requires capabilities which are most naturally understood through
recurrent models of computation, like the Turing machine. However, Transformer models …

被引用次数：128 相关文章所有 4 个版本

[PDF] mlr.press

Looped transformers as programmable computers

A Giannou, S Rajput, J Sohn, K Lee… - International …, 2023 - proceedings.mlr.press

We present a framework for using transformer networks as universal computers by
programming them with specific weights and placing them in a loop. Our input sequence …

被引用次数：60 相关文章所有 9 个版本

[PDF] arxiv.org

Long range language modeling via gated state spaces

H Mehta, A Gupta, A Cutkosky, B Neyshabur - arXiv preprint arXiv …, 2022 - arxiv.org

State space models have shown to be effective at modeling long range dependencies,
specially on sequence classification tasks. In this work we focus on autoregressive …

被引用次数：160 相关文章所有 6 个版本

[PDF] arxiv.org

A length-extrapolatable transformer

Y Sun, L Dong, B Patra, S Ma, S Huang… - arXiv preprint arXiv …, 2022 - arxiv.org

Position modeling plays a critical role in Transformers. In this paper, we focus on length
extrapolation, ie, training on short texts while evaluating longer sequences. We define …

被引用次数：114 相关文章所有 6 个版本

[PDF] arxiv.org

Mamba-360: Survey of state space models as transformer alternative for long sequence modelling: Methods, applications, and challenges

BN Patro, VS Agneeswaran - arXiv preprint arXiv:2404.16112, 2024 - arxiv.org

Sequence modeling is a crucial area across various domains, including Natural Language
Processing (NLP), speech recognition, time series forecasting, music generation, and …

被引用次数：19 相关文章

[PDF] arxiv.org

Scaling transformer to 1m tokens and beyond with rmt

A Bulatov, Y Kuratov, Y Kapushev… - arXiv preprint arXiv …, 2023 - arxiv.org

A major limitation for the broader scope of problems solvable by transformers is the
quadratic scaling of computational complexity with input size. In this study, we investigate …

被引用次数：64 相关文章所有 3 个版本

[PDF] arxiv.org

The What, Why, and How of Context Length Extension Techniques in Large Language Models--A Detailed Survey

S Pawar, SM Tonmoy, SM Zaman, V Jain… - arXiv preprint arXiv …, 2024 - arxiv.org

The advent of Large Language Models (LLMs) represents a notable breakthrough in Natural
Language Processing (NLP), contributing to substantial progress in both text …

被引用次数：6 相关文章所有 4 个版本

[PDF] neurips.cc

Block-state transformers

J Pilault, M Fathi, O Firat, C Pal… - Advances in Neural …, 2024 - proceedings.neurips.cc

State space models (SSMs) have shown impressive results on tasks that require modeling
long-range dependencies and efficiently scale to long sequences owing to their …

被引用次数：12 相关文章所有 4 个版本

[PDF] researchgate.net

[PDF][PDF] Efficient large language models: A survey

Z Wan, X Wang, C Liu, S Alam, Y Zheng… - arXiv preprint arXiv …, 2023 - researchgate.net

Abstract Large Language Models (LLMs) have demonstrated remarkable capabilities in
important tasks such as natural language understanding, language generation, and …

被引用次数：66 相关文章所有 7 个版本

Megabyte: Predicting million-byte sequences with multiscale transformers

Transformers learn shortcuts to automata

Looped transformers as programmable computers

Long range language modeling via gated state spaces

A length-extrapolatable transformer

Mamba-360: Survey of state space models as transformer alternative for long sequence modelling: Methods, applications, and challenges

Scaling transformer to 1m tokens and beyond with rmt

The What, Why, and How of Context Length Extension Techniques in Large Language Models--A Detailed Survey

Block-state transformers

[PDF][PDF] Efficient large language models: A survey

高级搜索

引用