Sepvit: Separable vision transformer

R Azad, A Kazerouni, M Heidari, EK Aghdam… - Medical Image …, 2023 - Elsevier

The remarkable performance of the Transformer architecture in natural language processing
has recently also triggered broad interest in Computer Vision. Among other merits …

被引用次数：71 相关文章所有 7 个版本

[PDF] neurips.cc

Efficientformer: Vision transformers at mobilenet speed

Y Li, G Yuan, Y Wen, J Hu… - Advances in …, 2022 - proceedings.neurips.cc

Abstract Vision Transformers (ViT) have shown rapid progress in computer vision tasks,
achieving promising results on various benchmarks. However, due to the massive number of …

被引用次数：280 相关文章所有 6 个版本

[PDF] thecvf.com

Rethinking vision transformers for mobilenet size and speed

Y Li, J Hu, Y Wen, G Evangelidis… - Proceedings of the …, 2023 - openaccess.thecvf.com

With the success of Vision Transformers (ViTs) in computer vision tasks, recent arts try to
optimize the performance and complexity of ViTs to enable efficient deployment on mobile …

被引用次数：124 相关文章所有 5 个版本

[PDF] arxiv.org

Next-vit: Next generation vision transformer for efficient deployment in realistic industrial scenarios

J Li, X Xia, W Li, H Li, X Wang, X Xiao, R Wang… - arXiv preprint arXiv …, 2022 - arxiv.org

Due to the complex attention mechanisms and model design, most existing vision
Transformers (ViTs) can not perform as efficiently as convolutional neural networks (CNNs) …

被引用次数：155 相关文章所有 2 个版本

[PDF] arxiv.org

Swin3d: A pretrained transformer backbone for 3d indoor scene understanding

YQ Yang, YX Guo, JY Xiong, Y Liu, H Pan… - arXiv preprint arXiv …, 2023 - arxiv.org

The use of pretrained backbones with fine-tuning has been successful for 2D vision and
natural language processing tasks, showing advantages over task-specific networks. In this …

被引用次数：51 相关文章所有 2 个版本

[PDF] thecvf.com

Elasticvit: Conflict-aware supernet training for deploying fast vision transformer on diverse mobile devices

C Tang, LL Zhang, H Jiang, J Xu… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract Neural Architecture Search (NAS) has shown promising performance in the
automatic design of vision transformers (ViT) exceeding 1G FLOPs. However, designing …

被引用次数：15 相关文章所有 6 个版本

[PDF] ieee.org

A cnn-transformer hybrid model based on cswin transformer for uav image object detection

W Lu, C Lan, C Niu, W Liu, L Lyu… - IEEE Journal of …, 2023 - ieeexplore.ieee.org

The object detection of unmanned aerial vehicle (UAV) images has widespread applications
in numerous fields; however, the complex background, diverse scales, and uneven …

被引用次数：27 相关文章所有 2 个版本

[PDF] mdpi.com

Light-YOLOv5: A lightweight algorithm for improved YOLOv5 in complex fire scenarios

H Xu, B Li, F Zhong - Applied Sciences, 2022 - mdpi.com

Fire-detection technology is of great importance for successful fire-prevention measures.
Image-based fire detection is one effective method. At present, object-detection algorithms …

被引用次数：34 相关文章所有 7 个版本

SDBAD-Net: A spatial dual-branch attention dehazing network based on meta-former paradigm

G Zhang, W Fang, Y Zheng… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Image dehazing is an emblematical low-level vision task that aims at restoring haze-free
images from haze images. Recently, some methods adopts deep learning techniques to …

被引用次数：14 相关文章所有 2 个版本

[PDF] arxiv.org

TRT-ViT: TensorRT-oriented vision transformer

X Xia, J Li, J Wu, X Wang, X Xiao, M Zheng… - arXiv preprint arXiv …, 2022 - arxiv.org

We revisit the existing excellent Transformers from the perspective of practical application.
Most of them are not even as efficient as the basic ResNets series and deviate from the …

被引用次数：31 相关文章所有 3 个版本