Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers

Y Jiang, G Rajendran, P Ravikumar… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have the capacity to store and recall facts. Through
experimentation with open-source models, we observe that this ability to retrieve facts can …

Large language models as markov chains

O Zekri, A Odonnat, A Benechehab, L Bleistein… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have proven to be remarkably efficient, both across a wide
range of natural language processing tasks and well beyond them. However, a …

How Do Nonlinear Transformers Acquire Generalization-Guaranteed CoT Ability?

H Li, M Wang, S Lu, X Cui, PY Chen - High-dimensional Learning …, 2024 - openreview.net
Chain-of-Thought (CoT) is an efficient prompting method that enables the reasoning ability
of large language models by augmenting the query using multiple examples with …

On the Power of Convolution Augmented Transformer

M Li, X Zhang, Y Huang, S Oymak - arXiv preprint arXiv:2407.05591, 2024 - arxiv.org
The transformer architecture has catalyzed revolutionary advances in language modeling.
However, recent architectural recipes, such as state-space models, have bridged the …

Optimizing Attention with Mirror Descent: Generalized Max-Margin Token Selection

AAK Julistiono, DA Tarzanagh, N Azizan - arXiv preprint arXiv:2410.14581, 2024 - arxiv.org
Attention mechanisms have revolutionized several domains of artificial intelligence, such as
natural language processing and computer vision, by enabling models to selectively focus …

Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis

H Li, M Wang, S Lu, X Cui, PY Chen - arXiv preprint arXiv:2410.02167, 2024 - arxiv.org
Chain-of-Thought (CoT) is an efficient prompting method that enables the reasoning ability
of large language models by augmenting the query using multiple examples with multiple …

Local to Global: Learning Dynamics and Effect of Initialization for Transformers

AV Makkuva, M Bondaschi, C Ekbote, A Girish… - arXiv preprint arXiv …, 2024 - arxiv.org
In recent years, transformer-based models have revolutionized deep learning, particularly in
sequence modeling. To better understand this phenomenon, there is a growing interest in …

Achieving the Tightest Relaxation of Sigmoids for Formal Verification

S Chevalier, D Starkenburg, K Dvijotham - arXiv preprint arXiv:2408.10491, 2024 - arxiv.org
In the field of formal verification, Neural Networks (NNs) are typically reformulated into
equivalent mathematical programs which are optimized over. To overcome the inherent non …