A practical survey on faster and lighter transformers

S Islam, H Elmekki, A Elsebai, J Bentahar… - Expert Systems with …, 2024 - Elsevier

Abstract Transformers are Deep Neural Networks (DNN) that utilize a self-attention
mechanism to capture contextual relationships within sequential data. Unlike traditional …

被引用次数：141 相关文章所有 4 个版本

[PDF] arxiv.org

A survey of techniques for optimizing transformer inference

KT Chitty-Venkata, S Mittal, M Emani… - Journal of Systems …, 2023 - Elsevier

Recent years have seen a phenomenal rise in the performance and applications of
transformer neural networks. The family of transformer networks, including Bidirectional …

被引用次数：62 相关文章所有 6 个版本

[PDF] neurips.cc

Hyenadna: Long-range genomic sequence modeling at single nucleotide resolution

E Nguyen, M Poli, M Faizi, A Thomas… - Advances in neural …, 2024 - proceedings.neurips.cc

Genomic (DNA) sequences encode an enormous amount of information for gene regulation
and protein synthesis. Similar to natural language models, researchers have proposed …

被引用次数：200 相关文章所有 8 个版本

[PDF] arxiv.org

Video transformers: A survey

J Selva, AS Johansen, S Escalera… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

Transformer models have shown great success handling long-range interactions, making
them a promising tool for modeling video. However, they lack inductive biases and scale …

被引用次数：125 相关文章所有 8 个版本

[PDF] arxiv.org

Uncovering mesa-optimization algorithms in transformers

J Von Oswald, M Schlegel, A Meulemans… - arXiv preprint arXiv …, 2023 - arxiv.org

Some autoregressive models exhibit in-context learning capabilities: being able to learn as
an input sequence is processed, without undergoing any parameter changes, and without …

被引用次数：39 相关文章所有 2 个版本

[PDF] thecvf.com

Omnivec: Learning robust representations with cross modal sharing

S Srivastava, G Sharma - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com

Majority of research in learning based methods has been towards designing and training
networks for specific tasks. However, many of the learning based tasks, across modalities …

被引用次数：66 相关文章所有 5 个版本

[PDF] arxiv.org

Beyond efficiency: A systematic survey of resource-efficient large language models

G Bai, Z Chai, C Ling, S Wang, J Lu, N Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

The burgeoning field of Large Language Models (LLMs), exemplified by sophisticated
models like OpenAI's ChatGPT, represents a significant advancement in artificial …

被引用次数：61 相关文章所有 2 个版本

[PDF] ieee.org

Neural architecture search for transformers: A survey

KT Chitty-Venkata, M Emani, V Vishwanath… - IEEE …, 2022 - ieeexplore.ieee.org

Transformer-based Deep Neural Network architectures have gained tremendous interest
due to their effectiveness in various applications across Natural Language Processing (NLP) …

被引用次数：68 相关文章所有 5 个版本

[PDF] arxiv.org

E^ 2VPT: An Effective and Efficient Approach for Visual Prompt Tuning

C Han, Q Wang, Y Cui, Z Cao, W Wang, S Qi… - arXiv preprint arXiv …, 2023 - arxiv.org

As the size of transformer-based models continues to grow, fine-tuning these large-scale
pretrained vision models for new tasks has become increasingly parameter-intensive …

被引用次数：65 相关文章所有 5 个版本

[PDF] arxiv.org

A survey of resource-efficient llm and multimodal foundation models

M Xu, W Yin, D Cai, R Yi, D Xu, Q Wang, B Wu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …

被引用次数：71 相关文章所有 3 个版本