- 学术资源搜索

Faith and fate: Limits of transformers on compositionality

N Dziri, X Lu, M Sclar, XL Li, L Jiang… - Advances in …, 2024 - proceedings.neurips.cc

Transformer large language models (LLMs) have sparked admiration for their exceptional
performance on tasks that demand intricate multi-step reasoning. Yet, these models …

被引用次数：261 相关文章所有 7 个版本

[PDF] neurips.cc

Towards revealing the mystery behind chain of thought: a theoretical perspective

G Feng, B Zhang, Y Gu, H Ye, D He… - Advances in Neural …, 2024 - proceedings.neurips.cc

Recent studies have discovered that Chain-of-Thought prompting (CoT) can dramatically
improve the performance of Large Language Models (LLMs), particularly when dealing with …

被引用次数：129 相关文章所有 6 个版本

[PDF] arxiv.org

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arXiv preprint arXiv …, 2024 - arxiv.org

This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

被引用次数：72 相关文章所有 3 个版本

[PDF] arxiv.org

In-context language learning: Arhitectures and algorithms

E Akyürek, B Wang, Y Kim, J Andreas - arXiv preprint arXiv:2401.12973, 2024 - arxiv.org

Large-scale neural language models exhibit a remarkable capacity for in-context learning
(ICL): they can infer novel functions from datasets provided as input. Most of our current …

被引用次数：43 相关文章所有 3 个版本

[PDF] mlr.press

Tighter bounds on the expressivity of transformer encoders

D Chiang, P Cholak, A Pillay - International Conference on …, 2023 - proceedings.mlr.press

Characterizing neural networks in terms of better-understood formal systems has the
potential to yield new insights into the power and limitations of these networks. Doing so for …

被引用次数：43 相关文章所有 6 个版本

[PDF] arxiv.org

The expresssive power of transformers with chain of thought

W Merrill, A Sabharwal - arXiv preprint arXiv:2310.07923, 2023 - arxiv.org

Recent theoretical work has identified surprisingly simple reasoning problems, such as
checking if two nodes in a graph are connected or simulating finite-state machines, that are …

被引用次数：56 相关文章所有 4 个版本

[PDF] mit.edu

What formal languages can transformers express? a survey

L Strobl, W Merrill, G Weiss, D Chiang… - Transactions of the …, 2024 - direct.mit.edu

As transformers have gained prominence in natural language processing, some researchers
have investigated theoretically what problems they can and cannot solve, by treating …

被引用次数：16 相关文章所有 5 个版本

[PDF] neurips.cc

A logic for expressing log-precision transformers

W Merrill, A Sabharwal - Advances in Neural Information …, 2024 - proceedings.neurips.cc

One way to interpret the reasoning power of transformer-based language models is to
describe the types of logical rules they can resolve over some input text. Recently, Chiang et …

被引用次数：23 相关文章所有 6 个版本

[PDF] arxiv.org

xLSTM: Extended Long Short-Term Memory

M Beck, K Pöppel, M Spanring, A Auer… - arXiv preprint arXiv …, 2024 - arxiv.org

In the 1990s, the constant error carousel and gating were introduced as the central ideas of
the Long Short-Term Memory (LSTM). Since then, LSTMs have stood the test of time and …

被引用次数：72 相关文章所有 2 个版本

[PDF] arxiv.org

The illusion of state in state-space models

W Merrill, J Petty, A Sabharwal - arXiv preprint arXiv:2404.08819, 2024 - arxiv.org

State-space models (SSMs) have emerged as a potential alternative architecture for building
large language models (LLMs) compared to the previously ubiquitous transformer …

被引用次数：19 相关文章所有 3 个版本