Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers
Large Language Models (LLMs) have the capacity to store and recall facts. Through
experimentation with open-source models, we observe that this ability to retrieve facts can …
experimentation with open-source models, we observe that this ability to retrieve facts can …
Large language models as markov chains
Large language models (LLMs) have proven to be remarkably efficient, both across a wide
range of natural language processing tasks and well beyond them. However, a …
range of natural language processing tasks and well beyond them. However, a …
How Do Nonlinear Transformers Acquire Generalization-Guaranteed CoT Ability?
Chain-of-Thought (CoT) is an efficient prompting method that enables the reasoning ability
of large language models by augmenting the query using multiple examples with …
of large language models by augmenting the query using multiple examples with …
On the Power of Convolution Augmented Transformer
The transformer architecture has catalyzed revolutionary advances in language modeling.
However, recent architectural recipes, such as state-space models, have bridged the …
However, recent architectural recipes, such as state-space models, have bridged the …
Optimizing Attention with Mirror Descent: Generalized Max-Margin Token Selection
AAK Julistiono, DA Tarzanagh, N Azizan - arXiv preprint arXiv:2410.14581, 2024 - arxiv.org
Attention mechanisms have revolutionized several domains of artificial intelligence, such as
natural language processing and computer vision, by enabling models to selectively focus …
natural language processing and computer vision, by enabling models to selectively focus …
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis
Chain-of-Thought (CoT) is an efficient prompting method that enables the reasoning ability
of large language models by augmenting the query using multiple examples with multiple …
of large language models by augmenting the query using multiple examples with multiple …
Local to Global: Learning Dynamics and Effect of Initialization for Transformers
In recent years, transformer-based models have revolutionized deep learning, particularly in
sequence modeling. To better understand this phenomenon, there is a growing interest in …
sequence modeling. To better understand this phenomenon, there is a growing interest in …
Achieving the Tightest Relaxation of Sigmoids for Formal Verification
S Chevalier, D Starkenburg, K Dvijotham - arXiv preprint arXiv:2408.10491, 2024 - arxiv.org
In the field of formal verification, Neural Networks (NNs) are typically reformulated into
equivalent mathematical programs which are optimized over. To overcome the inherent non …
equivalent mathematical programs which are optimized over. To overcome the inherent non …