A comprehensive survey on pretrained foundation models: A history from bert to chatgpt
Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks with different data modalities. A PFM (eg, BERT, ChatGPT, and GPT-4) is …
downstream tasks with different data modalities. A PFM (eg, BERT, ChatGPT, and GPT-4) is …
Transformers in medical imaging: A survey
Following unprecedented success on the natural language tasks, Transformers have been
successfully applied to several computer vision problems, achieving state-of-the-art results …
successfully applied to several computer vision problems, achieving state-of-the-art results …
Run, don't walk: chasing higher FLOPS for faster neural networks
To design fast neural networks, many works have been focusing on reducing the number of
floating-point operations (FLOPs). We observe that such reduction in FLOPs, however, does …
floating-point operations (FLOPs). We observe that such reduction in FLOPs, however, does …
Efficientvit: Memory efficient vision transformer with cascaded group attention
Vision transformers have shown great success due to their high model capabilities.
However, their remarkable performance is accompanied by heavy computation costs, which …
However, their remarkable performance is accompanied by heavy computation costs, which …
Deit iii: Revenge of the vit
Abstract A Vision Transformer (ViT) is a simple neural architecture amenable to serve
several computer vision tasks. It has limited built-in architectural priors, in contrast to more …
several computer vision tasks. It has limited built-in architectural priors, in contrast to more …
Davit: Dual attention vision transformers
In this work, we introduce Dual Attention Vision Transformers (DaViT), a simple yet effective
vision transformer architecture that is able to capture global context while maintaining …
vision transformer architecture that is able to capture global context while maintaining …
Rethinking vision transformers for mobilenet size and speed
With the success of Vision Transformers (ViTs) in computer vision tasks, recent arts try to
optimize the performance and complexity of ViTs to enable efficient deployment on mobile …
optimize the performance and complexity of ViTs to enable efficient deployment on mobile …
Topformer: Token pyramid transformer for mobile semantic segmentation
Although vision transformers (ViTs) have achieved great success in computer vision, the
heavy computational cost hampers their applications to dense prediction tasks such as …
heavy computational cost hampers their applications to dense prediction tasks such as …
Tinyvit: Fast pretraining distillation for small vision transformers
Vision transformer (ViT) recently has drawn great attention in computer vision due to its
remarkable model capability. However, most prevailing ViT models suffer from huge number …
remarkable model capability. However, most prevailing ViT models suffer from huge number …
Uniformer: Unifying convolution and self-attention for visual recognition
It is a challenging task to learn discriminative representation from images and videos, due to
large local redundancy and complex global dependency in these visual data. Convolution …
large local redundancy and complex global dependency in these visual data. Convolution …