相关文章- 学术资源搜索

Scaling up your kernels to 31x31: Revisiting large kernel design in cnns

X Ding, X Zhang, J Han, G Ding - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

We revisit large kernel design in modern convolutional neural networks (CNNs). Inspired by
recent advances in vision transformers (ViTs), in this paper, we demonstrate that using a few …

被引用次数：784 相关文章所有 10 个版本

[PDF] arxiv.org

More convnets in the 2020s: Scaling up kernels beyond 51x51 using sparsity

S Liu, T Chen, X Chen, X Chen, Q Xiao, B Wu… - arXiv preprint arXiv …, 2022 - arxiv.org

Transformers have quickly shined in the computer vision world since the emergence of
Vision Transformers (ViTs). The dominant role of convolutional neural networks (CNNs) …

被引用次数：139 相关文章所有 12 个版本

[PDF] thecvf.com

Cvt: Introducing convolutions to vision transformers

H Wu, B Xiao, N Codella, M Liu, X Dai… - Proceedings of the …, 2021 - openaccess.thecvf.com

We present in this paper a new architecture, named Convolutional vision Transformer (CvT),
that improves Vision Transformer (ViT) in performance and efficiency by introducing …

被引用次数：1950 相关文章所有 7 个版本

[PDF] thecvf.com

Cmt: Convolutional neural networks meet vision transformers

J Guo, K Han, H Wu, Y Tang, X Chen… - Proceedings of the …, 2022 - openaccess.thecvf.com

Vision transformers have been successfully applied to image recognition tasks due to their
ability to capture long-range dependencies within an image. However, there are still gaps in …

被引用次数：650 相关文章所有 6 个版本

[PDF] thecvf.com

FastViT: A fast hybrid vision transformer using structural reparameterization

PKA Vasu, J Gabriel, J Zhu, O Tuzel… - Proceedings of the …, 2023 - openaccess.thecvf.com

The recent amalgamation of transformer and convolutional designs has led to steady
improvements in accuracy and efficiency of the models. In this work, we introduce FastViT, a …

被引用次数：76 相关文章所有 5 个版本

[PDF] thecvf.com

Patch slimming for efficient vision transformers

Y Tang, K Han, Y Wang, C Xu, J Guo… - Proceedings of the …, 2022 - openaccess.thecvf.com

This paper studies the efficiency problem for visual transformers by excavating redundant
calculation in given networks. The recent transformer architecture has demonstrated its …

被引用次数：156 相关文章所有 5 个版本

[PDF] neurips.cc

Dynamicvit: Efficient vision transformers with dynamic token sparsification

Y Rao, W Zhao, B Liu, J Lu, J Zhou… - Advances in neural …, 2021 - proceedings.neurips.cc

Attention is sparse in vision transformers. We observe the final prediction in vision
transformers is only based on a subset of most informative tokens, which is sufficient for …

被引用次数：556 相关文章所有 9 个版本

[PDF] arxiv.org

How to train your vit? data, augmentation, and regularization in vision transformers

A Steiner, A Kolesnikov, X Zhai, R Wightman… - arXiv preprint arXiv …, 2021 - arxiv.org

Vision Transformers (ViT) have been shown to attain highly competitive performance for a
wide range of vision applications, such as image classification, object detection and …

被引用次数：568 相关文章所有 5 个版本

[PDF] arxiv.org

Lightvit: Towards light-weight convolution-free vision transformers

T Huang, L Huang, S You, F Wang, C Qian… - arXiv preprint arXiv …, 2022 - arxiv.org

Vision transformers (ViTs) are usually considered to be less light-weight than convolutional
neural networks (CNNs) due to the lack of inductive bias. Recent works thus resort to …

被引用次数：55 相关文章所有 3 个版本

[PDF] thecvf.com

Vision transformer with progressive sampling

X Yue, S Sun, Z Kuang, M Wei… - Proceedings of the …, 2021 - openaccess.thecvf.com

Transformers with powerful global relation modeling abilities have been introduced to
fundamental computer vision tasks recently. As a typical example, the Vision Transformer …

被引用次数：99 相关文章所有 7 个版本