From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers

Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers

Y Jiang, G Rajendran, P Ravikumar… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) have the capacity to store and recall facts. Through
experimentation with open-source models, we observe that this ability to retrieve facts can …

被引用次数：4 相关文章所有 4 个版本

[PDF] arxiv.org

Large language models as markov chains

O Zekri, A Odonnat, A Benechehab, L Bleistein… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) have proven to be remarkably efficient, both across a wide
range of natural language processing tasks and well beyond them. However, a …

被引用次数：2 相关文章所有 3 个版本

[PDF] openreview.net

How Do Nonlinear Transformers Acquire Generalization-Guaranteed CoT Ability?

H Li, M Wang, S Lu, X Cui, PY Chen - High-dimensional Learning …, 2024 - openreview.net

Chain-of-Thought (CoT) is an efficient prompting method that enables the reasoning ability
of large language models by augmenting the query using multiple examples with …

被引用次数：3 相关文章

[PDF] arxiv.org

On the Power of Convolution Augmented Transformer

M Li, X Zhang, Y Huang, S Oymak - arXiv preprint arXiv:2407.05591, 2024 - arxiv.org

The transformer architecture has catalyzed revolutionary advances in language modeling.
However, recent architectural recipes, such as state-space models, have bridged the …

Achieving the Tightest Relaxation of Sigmoids for Formal Verification

S Chevalier, D Starkenburg, K Dvijotham - arXiv preprint arXiv:2408.10491, 2024 - arxiv.org

In the field of formal verification, Neural Networks (NNs) are typically reformulated into
equivalent mathematical programs which are optimized over. To overcome the inherent non …