Deep learning for credit card fraud detection: A review of algorithms, challenges, and solutions

ID Mienye, N Jere - IEEE Access, 2024 - ieeexplore.ieee.org
Deep learning (DL), a branch of machine learning (ML), is the core technology in today's
technological advancements and innovations. Deep learning-based approaches are the …

Scan and snap: Understanding training dynamics and token composition in 1-layer transformer

Y Tian, Y Wang, B Chen, SS Du - Advances in Neural …, 2023 - proceedings.neurips.cc
Transformer architecture has shown impressive performance in multiple research domains
and has become the backbone of many neural network models. However, there is limited …

Transformers as support vector machines

DA Tarzanagh, Y Li, C Thrampoulidis… - arXiv preprint arXiv …, 2023 - arxiv.org
Since its inception in" Attention Is All You Need", transformer architecture has led to
revolutionary advancements in NLP. The attention layer within the transformer admits a …

Joma: Demystifying multilayer transformers via joint dynamics of mlp and attention

Y Tian, Y Wang, Z Zhang, B Chen, S Du - arXiv preprint arXiv:2310.00535, 2023 - arxiv.org
We propose Joint MLP/Attention (JoMA) dynamics, a novel mathematical framework to
understand the training procedure of multilayer Transformer architectures. This is achieved …

How transformers learn causal structure with gradient descent

E Nichani, A Damian, JD Lee - arXiv preprint arXiv:2402.14735, 2024 - arxiv.org
The incredible success of transformers on sequence modeling tasks can be largely
attributed to the self-attention mechanism, which allows information to be transferred …

Mechanics of next token prediction with self-attention

Y Li, Y Huang, ME Ildiz, AS Rawat… - International …, 2024 - proceedings.mlr.press
Transformer-based language models are trained on large datasets to predict the next token
given an input sequence. Despite this simple training objective, they have led to …

[PDF][PDF] How Do Nonlinear Transformers Learn and Generalize in In-Context Learning?

H Li, M Wang, S Lu, X Cui, PY Chen - arXiv preprint arXiv …, 2024 - researchgate.net
Transformer-based large language models have displayed impressive in-context learning
capabilities, where a pre-trained model can handle new tasks without fine-tuning by simply …

Training dynamics of multi-head softmax attention for in-context learning: Emergence, convergence, and optimality

S Chen, H Sheen, T Wang, Z Yang - arXiv preprint arXiv:2402.19442, 2024 - arxiv.org
We study the dynamics of gradient flow for training a multi-head softmax attention model for
in-context learning of multi-task linear regression. We establish the global convergence of …

Attention with markov: A framework for principled analysis of transformers via markov chains

AV Makkuva, M Bondaschi, A Girish, A Nagle… - arXiv preprint arXiv …, 2024 - arxiv.org
In recent years, attention-based transformers have achieved tremendous success across a
variety of disciplines including natural languages. A key ingredient behind their success is …

Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers

Y Jiang, G Rajendran, P Ravikumar… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have the capacity to store and recall facts. Through
experimentation with open-source models, we observe that this ability to retrieve facts can …