Understanding the difficulty of training transformers

T Lin, Y Wang, X Liu, X Qiu - AI open, 2022 - Elsevier

Transformers have achieved great success in many artificial intelligence fields, such as
natural language processing, computer vision, and audio processing. Therefore, it is natural …

被引用次数：1168 相关文章所有 4 个版本

[PDF] arxiv.org

Transformers in time-series analysis: A tutorial

S Ahmed, IE Nielsen, A Tripathi, S Siddiqui… - Circuits, Systems, and …, 2023 - Springer

Transformer architectures have widespread applications, particularly in Natural Language
Processing and Computer Vision. Recently, Transformers have been employed in various …

被引用次数：112 相关文章所有 6 个版本

[PDF] arxiv.org

A survey of large language models

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arXiv preprint arXiv …, 2023 - arxiv.org

Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

被引用次数：2407 相关文章所有 4 个版本

[PDF] thecvf.com

Scaling up your kernels to 31x31: Revisiting large kernel design in cnns

X Ding, X Zhang, J Han, G Ding - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

We revisit large kernel design in modern convolutional neural networks (CNNs). Inspired by
recent advances in vision transformers (ViTs), in this paper, we demonstrate that using a few …

被引用次数：882 相关文章所有 10 个版本

[PDF] arxiv.org

Diffusion policy: Visuomotor policy learning via action diffusion

C Chi, S Feng, Y Du, Z Xu, E Cousineau… - arXiv preprint arXiv …, 2023 - arxiv.org

This paper introduces Diffusion Policy, a new way of generating robot behavior by
representing a robot's visuomotor policy as a conditional denoising diffusion process. We …

被引用次数：327 相关文章所有 6 个版本

[PDF] neurips.cc

Fine-tuning language models with just forward passes

S Malladi, T Gao, E Nichani… - Advances in …, 2023 - proceedings.neurips.cc

Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but
as LMs grow in size, backpropagation requires a prohibitively large amount of memory …

被引用次数：109 相关文章所有 6 个版本

[PDF] arxiv.org

Actionformer: Localizing moments of actions with transformers

CL Zhang, J Wu, Y Li - European Conference on Computer Vision, 2022 - Springer

Self-attention based Transformer models have demonstrated impressive results for image
classification and object detection, and more recently for video understanding. Inspired by …

被引用次数：345 相关文章所有 7 个版本

[PDF] biorxiv.org

High-resolution de novo structure prediction from primary sequence

R Wu, F Ding, R Wang, R Shen, X Zhang, S Luo, C Su… - BioRxiv, 2022 - biorxiv.org

Recent breakthroughs have used deep learning to exploit evolutionary information in
multiple sequence alignments (MSAs) to accurately predict protein structures. However …

被引用次数：309 相关文章所有 4 个版本

[PDF] neurips.cc

H2o: Heavy-hitter oracle for efficient generative inference of large language models

Z Zhang, Y Sheng, T Zhou, T Chen… - Advances in …, 2024 - proceedings.neurips.cc

Abstract Large Language Models (LLMs), despite their recent impressive accomplishments,
are notably cost-prohibitive to deploy, particularly for applications involving long-content …

被引用次数：136 相关文章所有 7 个版本

[PDF] neurips.cc

Are transformers more robust than cnns?

Y Bai, J Mei, AL Yuille, C Xie - Advances in neural …, 2021 - proceedings.neurips.cc

Transformer emerges as a powerful tool for visual recognition. In addition to demonstrating
competitive performance on a broad range of visual benchmarks, recent works also argue …

被引用次数：308 相关文章所有 11 个版本