Faith and fate: Limits of transformers on compositionality
Transformer large language models (LLMs) have sparked admiration for their exceptional
performance on tasks that demand intricate multi-step reasoning. Yet, these models …
performance on tasks that demand intricate multi-step reasoning. Yet, these models …
Towards revealing the mystery behind chain of thought: a theoretical perspective
Recent studies have discovered that Chain-of-Thought prompting (CoT) can dramatically
improve the performance of Large Language Models (LLMs), particularly when dealing with …
improve the performance of Large Language Models (LLMs), particularly when dealing with …
Foundational challenges in assuring alignment and safety of large language models
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …
language models (LLMs). These challenges are organized into three different categories …
In-context language learning: Arhitectures and algorithms
Large-scale neural language models exhibit a remarkable capacity for in-context learning
(ICL): they can infer novel functions from datasets provided as input. Most of our current …
(ICL): they can infer novel functions from datasets provided as input. Most of our current …
Tighter bounds on the expressivity of transformer encoders
Characterizing neural networks in terms of better-understood formal systems has the
potential to yield new insights into the power and limitations of these networks. Doing so for …
potential to yield new insights into the power and limitations of these networks. Doing so for …
The expresssive power of transformers with chain of thought
W Merrill, A Sabharwal - arXiv preprint arXiv:2310.07923, 2023 - arxiv.org
Recent theoretical work has identified surprisingly simple reasoning problems, such as
checking if two nodes in a graph are connected or simulating finite-state machines, that are …
checking if two nodes in a graph are connected or simulating finite-state machines, that are …
What formal languages can transformers express? a survey
As transformers have gained prominence in natural language processing, some researchers
have investigated theoretically what problems they can and cannot solve, by treating …
have investigated theoretically what problems they can and cannot solve, by treating …
A logic for expressing log-precision transformers
W Merrill, A Sabharwal - Advances in Neural Information …, 2024 - proceedings.neurips.cc
One way to interpret the reasoning power of transformer-based language models is to
describe the types of logical rules they can resolve over some input text. Recently, Chiang et …
describe the types of logical rules they can resolve over some input text. Recently, Chiang et …
xLSTM: Extended Long Short-Term Memory
In the 1990s, the constant error carousel and gating were introduced as the central ideas of
the Long Short-Term Memory (LSTM). Since then, LSTMs have stood the test of time and …
the Long Short-Term Memory (LSTM). Since then, LSTMs have stood the test of time and …
The illusion of state in state-space models
State-space models (SSMs) have emerged as a potential alternative architecture for building
large language models (LLMs) compared to the previously ubiquitous transformer …
large language models (LLMs) compared to the previously ubiquitous transformer …