Delving deep into the generalization of vision transformers under distribution shifts

F Shamshad, S Khan, SW Zamir, MH Khan… - Medical Image …, 2023 - Elsevier

Following unprecedented success on the natural language tasks, Transformers have been
successfully applied to several computer vision problems, achieving state-of-the-art results …

被引用次数：509 相关文章所有 9 个版本

[PDF] arxiv.org

A survey on deep learning-based monocular spacecraft pose estimation: Current state, limitations and prospects

L Pauly, W Rharbaoui, C Shneider, A Rathinam… - Acta Astronautica, 2023 - Elsevier

Estimating the pose of an uncooperative spacecraft is an important computer vision problem
for enabling the deployment of automatic vision-based systems in orbit, with applications …

被引用次数：24 相关文章所有 5 个版本

[PDF] arxiv.org

ClimaX: A foundation model for weather and climate

T Nguyen, J Brandstetter, A Kapoor, JK Gupta… - arXiv preprint arXiv …, 2023 - arxiv.org

Most state-of-the-art approaches for weather and climate modeling are based on physics-
informed numerical models of the atmosphere. These approaches aim to model the non …

被引用次数：163 相关文章所有 8 个版本

[PDF] neurips.cc

Are transformers more robust than cnns?

Y Bai, J Mei, AL Yuille, C Xie - Advances in neural …, 2021 - proceedings.neurips.cc

Transformer emerges as a powerful tool for visual recognition. In addition to demonstrating
competitive performance on a broad range of visual benchmarks, recent works also argue …

被引用次数：276 相关文章所有 11 个版本

[PDF] arxiv.org

Video transformers: A survey

J Selva, AS Johansen, S Escalera… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

Transformer models have shown great success handling long-range interactions, making
them a promising tool for modeling video. However, they lack inductive biases and scale …

被引用次数：93 相关文章所有 8 个版本

[PDF] thecvf.com

Part-aware transformer for generalizable person re-identification

H Ni, Y Li, L Gao, HT Shen… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Abstract Domain generalization person re-identification (DG ReID) aims to train a model on
source domains and generalize well on unseen domains. Vision Transformer usually yields …

被引用次数：24 相关文章所有 5 个版本

[PDF] arxiv.org

A survey of the vision transformers and their CNN-transformer based variants

A Khan, Z Rauf, A Sohail, AR Khan, H Asif… - Artificial Intelligence …, 2023 - Springer

Vision transformers have become popular as a possible substitute to convolutional neural
networks (CNNs) for a variety of computer vision applications. These transformers, with their …

被引用次数：42 相关文章所有 6 个版本

[PDF] thecvf.com

Delving into masked autoencoders for multi-label thorax disease classification

J Xiao, Y Bai, A Yuille, Z Zhou - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Abstract Vision Transformer (ViT) has become one of the most popular neural architectures
due to its simplicity, scalability, and compelling performance in multiple vision tasks …

被引用次数：52 相关文章所有 5 个版本

[PDF] arxiv.org

An impartial take to the cnn vs transformer robustness contest

F Pinto, PHS Torr, P K. Dokania - European Conference on Computer …, 2022 - Springer

Following the surge of popularity of Transformers in Computer Vision, several studies have
attempted to determine whether they could be more robust to distribution shifts and provide …

被引用次数：48 相关文章所有 6 个版本

[PDF] neurips.cc

A closer look at the robustness of contrastive language-image pre-training (clip)

W Tu, W Deng, T Gedeon - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Abstract Contrastive Language-Image Pre-training (CLIP) models have demonstrated
remarkable generalization capabilities across multiple challenging distribution shifts …

被引用次数：13 相关文章所有 7 个版本