Pay less attention with lightweight and dynamic convolutions

T Lin, Y Wang, X Liu, X Qiu - AI open, 2022 - Elsevier

Transformers have achieved great success in many artificial intelligence fields, such as
natural language processing, computer vision, and audio processing. Therefore, it is natural …

被引用次数：1168 相关文章所有 4 个版本

[PDF] jair.org

Neural machine translation: A review

F Stahlberg - Journal of Artificial Intelligence Research, 2020 - jair.org

The field of machine translation (MT), the automatic translation of written text from one
natural language into another, has experienced a major paradigm shift in recent years …

被引用次数：388 相关文章所有 7 个版本

[PDF] thecvf.com

Scaling up gans for text-to-image synthesis

M Kang, JY Zhu, R Zhang, J Park… - Proceedings of the …, 2023 - openaccess.thecvf.com

The recent success of text-to-image synthesis has taken the world by storm and captured the
general public's imagination. From a technical standpoint, it also marked a drastic change in …

被引用次数：368 相关文章所有 6 个版本

[PDF] neurips.cc

Flashattention: Fast and memory-efficient exact attention with io-awareness

T Dao, D Fu, S Ermon, A Rudra… - Advances in Neural …, 2022 - proceedings.neurips.cc

Transformers are slow and memory-hungry on long sequences, since the time and memory
complexity of self-attention are quadratic in sequence length. Approximate attention …

被引用次数：1189 相关文章所有 10 个版本

[PDF] arxiv.org

Maxvit: Multi-axis vision transformer

Z Tu, H Talebi, H Zhang, F Yang, P Milanfar… - European conference on …, 2022 - Springer

Transformers have recently gained significant attention in the computer vision community.
However, the lack of scalability of self-attention mechanisms with respect to image size has …

被引用次数：581 相关文章所有 8 个版本

[PDF] thecvf.com

Scaling up your kernels to 31x31: Revisiting large kernel design in cnns

X Ding, X Zhang, J Han, G Ding - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

We revisit large kernel design in modern convolutional neural networks (CNNs). Inspired by
recent advances in vision transformers (ViTs), in this paper, we demonstrate that using a few …

被引用次数：882 相关文章所有 10 个版本

[PDF] arxiv.org

Efficiently modeling long sequences with structured state spaces

A Gu, K Goel, C Ré - arXiv preprint arXiv:2111.00396, 2021 - arxiv.org

A central goal of sequence modeling is designing a single principled model that can
address sequence data across a range of modalities and tasks, particularly on long-range …

被引用次数：1032 相关文章所有 3 个版本

[PDF] neurips.cc

Mlp-mixer: An all-mlp architecture for vision

IO Tolstikhin, N Houlsby, A Kolesnikov… - Advances in neural …, 2021 - proceedings.neurips.cc

Abstract Convolutional Neural Networks (CNNs) are the go-to model for computer vision.
Recently, attention-based networks, such as the Vision Transformer, have also become …

被引用次数：2631 相关文章所有 15 个版本

[PDF] neurips.cc

Pay attention to mlps

H Liu, Z Dai, D So, QV Le - Advances in neural information …, 2021 - proceedings.neurips.cc

Transformers have become one of the most important architectural innovations in deep
learning and have enabled many breakthroughs over the past few years. Here we propose a …

被引用次数：587 相关文章所有 7 个版本

[PDF] thecvf.com

Cvt: Introducing convolutions to vision transformers

H Wu, B Xiao, N Codella, M Liu, X Dai… - Proceedings of the …, 2021 - openaccess.thecvf.com

We present in this paper a new architecture, named Convolutional vision Transformer (CvT),
that improves Vision Transformer (ViT) in performance and efficiency by introducing …

被引用次数：2116 相关文章所有 7 个版本