Faith and fate: Limits of transformers on compositionality

N Dziri, X Lu, M Sclar, XL Li, L Jiang… - Advances in …, 2024 - proceedings.neurips.cc
Transformer large language models (LLMs) have sparked admiration for their exceptional
performance on tasks that demand intricate multi-step reasoning. Yet, these models …

Towards revealing the mystery behind chain of thought: a theoretical perspective

G Feng, B Zhang, Y Gu, H Ye, D He… - Advances in Neural …, 2024 - proceedings.neurips.cc
Recent studies have discovered that Chain-of-Thought prompting (CoT) can dramatically
improve the performance of Large Language Models (LLMs), particularly when dealing with …

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arXiv preprint arXiv …, 2024 - arxiv.org
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

In-context language learning: Arhitectures and algorithms

E Akyürek, B Wang, Y Kim, J Andreas - arXiv preprint arXiv:2401.12973, 2024 - arxiv.org
Large-scale neural language models exhibit a remarkable capacity for in-context learning
(ICL): they can infer novel functions from datasets provided as input. Most of our current …

Tighter bounds on the expressivity of transformer encoders

D Chiang, P Cholak, A Pillay - International Conference on …, 2023 - proceedings.mlr.press
Characterizing neural networks in terms of better-understood formal systems has the
potential to yield new insights into the power and limitations of these networks. Doing so for …

The expresssive power of transformers with chain of thought

W Merrill, A Sabharwal - arXiv preprint arXiv:2310.07923, 2023 - arxiv.org
Recent theoretical work has identified surprisingly simple reasoning problems, such as
checking if two nodes in a graph are connected or simulating finite-state machines, that are …

What formal languages can transformers express? a survey

L Strobl, W Merrill, G Weiss, D Chiang… - Transactions of the …, 2024 - direct.mit.edu
As transformers have gained prominence in natural language processing, some researchers
have investigated theoretically what problems they can and cannot solve, by treating …

A logic for expressing log-precision transformers

W Merrill, A Sabharwal - Advances in Neural Information …, 2024 - proceedings.neurips.cc
One way to interpret the reasoning power of transformer-based language models is to
describe the types of logical rules they can resolve over some input text. Recently, Chiang et …

xLSTM: Extended Long Short-Term Memory

M Beck, K Pöppel, M Spanring, A Auer… - arXiv preprint arXiv …, 2024 - arxiv.org
In the 1990s, the constant error carousel and gating were introduced as the central ideas of
the Long Short-Term Memory (LSTM). Since then, LSTMs have stood the test of time and …

The illusion of state in state-space models

W Merrill, J Petty, A Sabharwal - arXiv preprint arXiv:2404.08819, 2024 - arxiv.org
State-space models (SSMs) have emerged as a potential alternative architecture for building
large language models (LLMs) compared to the previously ubiquitous transformer …