Visual transformers: Token-based image representation and processing for computer vision

S Cong, Y Zhou - Artificial Intelligence Review, 2023 - Springer

The research advances concerning the typical architectures of convolutional neural
networks (CNNs) as well as their optimizations are analyzed and elaborated in detail in this …

被引用次数：88 相关文章所有 5 个版本

[PDF] arxiv.org

Deep learning-based 3D point cloud classification: A systematic survey and outlook

H Zhang, C Wang, S Tian, B Lu, L Zhang, X Ning, X Bai - Displays, 2023 - Elsevier

In recent years, point cloud representation has become one of the research hotspots in the
field of computer vision, and has been widely used in many fields, such as autonomous …

被引用次数：37 相关文章所有 4 个版本

[PDF] arxiv.org

Efficient long-range attention network for image super-resolution

X Zhang, H Zeng, S Guo, L Zhang - European conference on computer …, 2022 - Springer

Recently, transformer-based methods have demonstrated impressive results in various
vision tasks, including image super-resolution (SR), by exploiting the self-attention (SA) for …

被引用次数：245 相关文章所有 4 个版本

[PDF] thecvf.com

Swinir: Image restoration using swin transformer

J Liang, J Cao, G Sun, K Zhang… - Proceedings of the …, 2021 - openaccess.thecvf.com

Image restoration is a long-standing low-level vision problem that aims to restore high-
quality images from low-quality images (eg, downscaled, noisy and compressed images) …

被引用次数：2514 相关文章所有 10 个版本

[PDF] neurips.cc

Do vision transformers see like convolutional neural networks?

M Raghu, T Unterthiner, S Kornblith… - Advances in neural …, 2021 - proceedings.neurips.cc

Convolutional neural networks (CNNs) have so far been the de-facto model for visual data.
Recent work has shown that (Vision) Transformer models (ViT) can achieve comparable or …

被引用次数：897 相关文章所有 8 个版本

SwinSUNet: Pure transformer network for remote sensing image change detection

C Zhang, L Wang, S Cheng, Y Li - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Convolutional neural network (CNN) can extract effective semantic features, so it was widely
used for remote sensing image change detection (CD) in the latest years. CNN has acquired …

被引用次数：232 相关文章所有 2 个版本

[PDF] mdpi.com

A survey of visual transformers

Y Liu, Y Zhang, Y Wang, F Hou, J Yuan… - … on Neural Networks …, 2023 - ieeexplore.ieee.org

Transformer, an attention-based encoder–decoder model, has already revolutionized the
field of natural language processing (NLP). Inspired by such significant achievements, some …

被引用次数：293 相关文章所有 22 个版本

[PDF] thecvf.com

Conformer: Local features coupling global representations for visual recognition

Z Peng, W Huang, S Gu, L Xie… - Proceedings of the …, 2021 - openaccess.thecvf.com

Abstract Within Convolutional Neural Network (CNN), the convolution operations are good
at extracting local features but experience difficulty to capture global representations. Within …

被引用次数：615 相关文章所有 14 个版本

[PDF] thecvf.com

Multiscale vision transformers

H Fan, B Xiong, K Mangalam, Y Li… - Proceedings of the …, 2021 - openaccess.thecvf.com

Abstract We present Multiscale Vision Transformers (MViT) for video and image recognition,
by connecting the seminal idea of multiscale feature hierarchies with transformer models …

被引用次数：1202 相关文章所有 5 个版本

[PDF] thecvf.com

Going deeper with image transformers

H Touvron, M Cord, A Sablayrolles… - Proceedings of the …, 2021 - openaccess.thecvf.com

Transformers have been recently adapted for large scale image classification, achieving
high scores shaking up the long supremacy of convolutional neural networks. However the …

被引用次数：987 相关文章所有 5 个版本