Deep learning for credit card fraud detection: A review of algorithms, challenges, and solutions
Deep learning (DL), a branch of machine learning (ML), is the core technology in today's
technological advancements and innovations. Deep learning-based approaches are the …
technological advancements and innovations. Deep learning-based approaches are the …
Scan and snap: Understanding training dynamics and token composition in 1-layer transformer
Transformer architecture has shown impressive performance in multiple research domains
and has become the backbone of many neural network models. However, there is limited …
and has become the backbone of many neural network models. However, there is limited …
Transformers as support vector machines
Since its inception in" Attention Is All You Need", transformer architecture has led to
revolutionary advancements in NLP. The attention layer within the transformer admits a …
revolutionary advancements in NLP. The attention layer within the transformer admits a …
Joma: Demystifying multilayer transformers via joint dynamics of mlp and attention
We propose Joint MLP/Attention (JoMA) dynamics, a novel mathematical framework to
understand the training procedure of multilayer Transformer architectures. This is achieved …
understand the training procedure of multilayer Transformer architectures. This is achieved …
How transformers learn causal structure with gradient descent
The incredible success of transformers on sequence modeling tasks can be largely
attributed to the self-attention mechanism, which allows information to be transferred …
attributed to the self-attention mechanism, which allows information to be transferred …
Mechanics of next token prediction with self-attention
Transformer-based language models are trained on large datasets to predict the next token
given an input sequence. Despite this simple training objective, they have led to …
given an input sequence. Despite this simple training objective, they have led to …
[PDF][PDF] How Do Nonlinear Transformers Learn and Generalize in In-Context Learning?
Transformer-based large language models have displayed impressive in-context learning
capabilities, where a pre-trained model can handle new tasks without fine-tuning by simply …
capabilities, where a pre-trained model can handle new tasks without fine-tuning by simply …
Training dynamics of multi-head softmax attention for in-context learning: Emergence, convergence, and optimality
We study the dynamics of gradient flow for training a multi-head softmax attention model for
in-context learning of multi-task linear regression. We establish the global convergence of …
in-context learning of multi-task linear regression. We establish the global convergence of …
Attention with markov: A framework for principled analysis of transformers via markov chains
In recent years, attention-based transformers have achieved tremendous success across a
variety of disciplines including natural languages. A key ingredient behind their success is …
variety of disciplines including natural languages. A key ingredient behind their success is …
Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers
Large Language Models (LLMs) have the capacity to store and recall facts. Through
experimentation with open-source models, we observe that this ability to retrieve facts can …
experimentation with open-source models, we observe that this ability to retrieve facts can …