Efficientformer: Vision transformers at mobilenet speed

KT Chitty-Venkata, S Mittal, M Emani… - Journal of Systems …, 2023 - Elsevier

Recent years have seen a phenomenal rise in the performance and applications of
transformer neural networks. The family of transformer networks, including Bidirectional …

被引用次数：25 相关文章所有 6 个版本

[PDF] thecvf.com

Run, don't walk: chasing higher FLOPS for faster neural networks

J Chen, S Kao, H He, W Zhuo, S Wen… - Proceedings of the …, 2023 - openaccess.thecvf.com

To design fast neural networks, many works have been focusing on reducing the number of
floating-point operations (FLOPs). We observe that such reduction in FLOPs, however, does …

被引用次数：568 相关文章所有 10 个版本

[PDF] thecvf.com

Efficientvit: Memory efficient vision transformer with cascaded group attention

X Liu, H Peng, N Zheng, Y Yang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Vision transformers have shown great success due to their high model capabilities.
However, their remarkable performance is accompanied by heavy computation costs, which …

被引用次数：139 相关文章所有 8 个版本

[PDF] thecvf.com

Rethinking vision transformers for mobilenet size and speed

Y Li, J Hu, Y Wen, G Evangelidis… - Proceedings of the …, 2023 - openaccess.thecvf.com

With the success of Vision Transformers (ViTs) in computer vision tasks, recent arts try to
optimize the performance and complexity of ViTs to enable efficient deployment on mobile …

被引用次数：107 相关文章所有 5 个版本

[PDF] thecvf.com

Repvit: Revisiting mobile cnn from vit perspective

A Wang, H Chen, Z Lin, J Han… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Abstract Recently lightweight Vision Transformers (ViTs) demonstrate superior performance
and lower latency compared with lightweight Convolutional Neural Networks (CNNs) on …

被引用次数：47 相关文章所有 4 个版本

[PDF] neurips.cc

Spike-driven transformer

M Yao, J Hu, Z Zhou, L Yuan, Y Tian… - Advances in neural …, 2024 - proceedings.neurips.cc

Abstract Spiking Neural Networks (SNNs) provide an energy-efficient deep learning option
due to their unique spike-based event-driven (ie, spike-driven) paradigm. In this paper, we …

被引用次数：46 相关文章所有 6 个版本

[PDF] neurips.cc

Snapfusion: Text-to-image diffusion model on mobile devices within two seconds

Y Li, H Wang, Q Jin, J Hu… - Advances in …, 2024 - proceedings.neurips.cc

Text-to-image diffusion models can create stunning images from natural language
descriptions that rival the work of professional artists and photographers. However, these …

被引用次数：69 相关文章所有 5 个版本

[PDF] thecvf.com

FastViT: A fast hybrid vision transformer using structural reparameterization

PKA Vasu, J Gabriel, J Zhu, O Tuzel… - Proceedings of the …, 2023 - openaccess.thecvf.com

The recent amalgamation of transformer and convolutional designs has led to steady
improvements in accuracy and efficiency of the models. In this work, we introduce FastViT, a …

被引用次数：76 相关文章所有 5 个版本

[PDF] arxiv.org

Faster segment anything: Towards lightweight sam for mobile applications

C Zhang, D Han, Y Qiao, JU Kim, SH Bae… - arXiv preprint arXiv …, 2023 - arxiv.org

Segment anything model (SAM) is a prompt-guided vision foundation model for cutting out
the object of interest from its background. Since Meta research team released the SA project …

被引用次数：173 相关文章所有 4 个版本

[PDF] thecvf.com

Efficientsam: Leveraged masked image pretraining for efficient segment anything

Y Xiong, B Varadarajan, L Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Segment Anything Model (SAM) has emerged as a powerful tool for numerous
vision applications. A key component that drives the impressive performance for zero-shot …

被引用次数：33 相关文章所有 3 个版本