- 学术资源搜索

Deep learning for credit card fraud detection: A review of algorithms, challenges, and solutions

ID Mienye, N Jere - IEEE Access, 2024 - ieeexplore.ieee.org

Deep learning (DL), a branch of machine learning (ML), is the core technology in today's
technological advancements and innovations. Deep learning-based approaches are the …

被引用次数：16 相关文章所有 2 个版本

[PDF] neurips.cc

Scan and snap: Understanding training dynamics and token composition in 1-layer transformer

Y Tian, Y Wang, B Chen, SS Du - Advances in Neural …, 2023 - proceedings.neurips.cc

Transformer architecture has shown impressive performance in multiple research domains
and has become the backbone of many neural network models. However, there is limited …

被引用次数：73 相关文章所有 10 个版本

[PDF] arxiv.org

Transformers as support vector machines

DA Tarzanagh, Y Li, C Thrampoulidis… - arXiv preprint arXiv …, 2023 - arxiv.org

Since its inception in" Attention Is All You Need", transformer architecture has led to
revolutionary advancements in NLP. The attention layer within the transformer admits a …

被引用次数：72 相关文章所有 2 个版本

[PDF] arxiv.org

Joma: Demystifying multilayer transformers via joint dynamics of mlp and attention

Y Tian, Y Wang, Z Zhang, B Chen, S Du - arXiv preprint arXiv:2310.00535, 2023 - arxiv.org

We propose Joint MLP/Attention (JoMA) dynamics, a novel mathematical framework to
understand the training procedure of multilayer Transformer architectures. This is achieved …

被引用次数：40 相关文章所有 6 个版本

[PDF] arxiv.org

How transformers learn causal structure with gradient descent

E Nichani, A Damian, JD Lee - arXiv preprint arXiv:2402.14735, 2024 - arxiv.org

The incredible success of transformers on sequence modeling tasks can be largely
attributed to the self-attention mechanism, which allows information to be transferred …

被引用次数：44 相关文章所有 3 个版本

[PDF] mlr.press

Mechanics of next token prediction with self-attention

Y Li, Y Huang, ME Ildiz, AS Rawat… - International …, 2024 - proceedings.mlr.press

Transformer-based language models are trained on large datasets to predict the next token
given an input sequence. Despite this simple training objective, they have led to …

被引用次数：21 相关文章所有 6 个版本

[PDF] researchgate.net

[PDF][PDF] How Do Nonlinear Transformers Learn and Generalize in In-Context Learning?

H Li, M Wang, S Lu, X Cui, PY Chen - arXiv preprint arXiv …, 2024 - researchgate.net

Transformer-based large language models have displayed impressive in-context learning
capabilities, where a pre-trained model can handle new tasks without fine-tuning by simply …

被引用次数：16 相关文章所有 2 个版本

[PDF] arxiv.org

Training dynamics of multi-head softmax attention for in-context learning: Emergence, convergence, and optimality

S Chen, H Sheen, T Wang, Z Yang - arXiv preprint arXiv:2402.19442, 2024 - arxiv.org

We study the dynamics of gradient flow for training a multi-head softmax attention model for
in-context learning of multi-task linear regression. We establish the global convergence of …

被引用次数：33 相关文章所有 2 个版本

[PDF] arxiv.org

Attention with markov: A framework for principled analysis of transformers via markov chains

AV Makkuva, M Bondaschi, A Girish, A Nagle… - arXiv preprint arXiv …, 2024 - arxiv.org

In recent years, attention-based transformers have achieved tremendous success across a
variety of disciplines including natural languages. A key ingredient behind their success is …

被引用次数：20 相关文章

[PDF] arxiv.org

Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers

Y Jiang, G Rajendran, P Ravikumar… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) have the capacity to store and recall facts. Through
experimentation with open-source models, we observe that this ability to retrieve facts can …

被引用次数：4 相关文章所有 4 个版本