相关文章- 学术资源搜索

Mvitv2: Improved multiscale vision transformers for classification and detection

Y Li, CY Wu, H Fan, K Mangalam… - Proceedings of the …, 2022 - openaccess.thecvf.com

In this paper, we study Multiscale Vision Transformers (MViTv2) as a unified architecture for
image and video classification, as well as object detection. We present an improved version …

被引用次数：407 相关文章所有 5 个版本

[PDF] thecvf.com

Multiscale vision transformers

H Fan, B Xiong, K Mangalam, Y Li… - Proceedings of the …, 2021 - openaccess.thecvf.com

Abstract We present Multiscale Vision Transformers (MViT) for video and image recognition,
by connecting the seminal idea of multiscale feature hierarchies with transformer models …

被引用次数：1984 相关文章所有 11 个版本

[PDF] thecvf.com

Video swin transformer

Z Liu, J Ning, Y Cao, Y Wei, Z Zhang… - Proceedings of the …, 2022 - openaccess.thecvf.com

The vision community is witnessing a modeling shift from CNNs to Transformers, where pure
Transformer architectures have attained top accuracy on the major video recognition …

被引用次数：1225 相关文章所有 9 个版本

[PDF] arxiv.org

Regionvit: Regional-to-local attention for vision transformers

CF Chen, R Panda, Q Fan - arXiv preprint arXiv:2106.02689, 2021 - arxiv.org

Vision transformer (ViT) has recently shown its strong capability in achieving comparable
results to convolutional neural networks (CNNs) on image classification. However, vanilla …

被引用次数：151 相关文章所有 5 个版本

[PDF] thecvf.com

Multiview transformers for video recognition

S Yan, X Xiong, A Arnab, Z Lu… - Proceedings of the …, 2022 - openaccess.thecvf.com

Video understanding requires reasoning at multiple spatiotemporal resolutions--from short
fine-grained motions to events taking place over longer durations. Although transformer …

被引用次数：175 相关文章所有 9 个版本

[PDF] thecvf.com

Mpvit: Multi-path vision transformer for dense prediction

Y Lee, J Kim, J Willette… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

Dense computer vision tasks such as object detection and segmentation require effective
multi-scale feature representation for detecting or classifying objects or regions with varying …

被引用次数：165 相关文章所有 7 个版本

[PDF] thecvf.com

Mobile-former: Bridging mobilenet and transformer

Y Chen, X Dai, D Chen, M Liu… - Proceedings of the …, 2022 - openaccess.thecvf.com

Abstract We present Mobile-Former, a parallel design of MobileNet and transformer with a
two-way bridge in between. This structure leverages the advantages of MobileNet at local …

被引用次数：331 相关文章所有 8 个版本

[PDF] arxiv.org

Vsa: Learning varied-size window attention in vision transformers

Q Zhang, Y Xu, J Zhang, D Tao - European conference on computer vision, 2022 - Springer

Attention within windows has been widely explored in vision transformers to balance the
performance, computation complexity, and memory footprint. However, current models adopt …

被引用次数：46 相关文章所有 5 个版本

[PDF] arxiv.org

Maxvit: Multi-axis vision transformer

Z Tu, H Talebi, H Zhang, F Yang, P Milanfar… - European conference on …, 2022 - Springer

Transformers have recently gained significant attention in the computer vision community.
However, the lack of scalability of self-attention mechanisms with respect to image size has …

被引用次数：283 相关文章所有 6 个版本

[PDF] thecvf.com

Transmix: Attend to mix for vision transformers

JN Chen, S Sun, J He, PHS Torr… - Proceedings of the …, 2022 - openaccess.thecvf.com

Mixup-based augmentation has been found to be effective for generalizing models during
training, especially for Vision Transformers (ViTs) since they can easily overfit. However …

被引用次数：69 相关文章所有 9 个版本

Mvitv2: Improved multiscale vision transformers for classification and detection

Multiscale vision transformers

Video swin transformer

Regionvit: Regional-to-local attention for vision transformers

Multiview transformers for video recognition

Mpvit: Multi-path vision transformer for dense prediction

Mobile-former: Bridging mobilenet and transformer

Vsa: Learning varied-size window attention in vision transformers

Maxvit: Multi-axis vision transformer

Transmix: Attend to mix for vision transformers

相关搜索

高级搜索

引用