A comprehensive survey of transformers for computer vision

S Jamil, M Jalil Piran, OJ Kwon - Drones, 2023 - mdpi.com
As a special type of transformer, vision transformers (ViTs) can be used for various computer
vision (CV) applications. Convolutional neural networks (CNNs) have several potential …

Backdoor defense via adaptively splitting poisoned dataset

K Gao, Y Bai, J Gu, Y Yang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Backdoor defenses have been studied to alleviate the threat of deep neural networks
(DNNs) being backdoor attacked and thus maliciously altered. Since DNNs usually adopt …

Towards efficient adversarial training on vision transformers

B Wu, J Gu, Z Li, D Cai, X He, W Liu - European Conference on Computer …, 2022 - Springer
Abstract Vision Transformer (ViT), as a powerful alternative to Convolutional Neural Network
(CNN), has received much attention. Recent work showed that ViTs are also vulnerable to …

BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP

J Bai, K Gao, S Min, ST Xia, Z Li… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract Contrastive Vision-Language Pre-training known as CLIP has shown promising
effectiveness in addressing downstream image recognition tasks. However recent works …

Vtc-lfc: Vision transformer compression with low-frequency components

Z Wang, H Luo, P Wang, F Ding… - Advances in Neural …, 2022 - proceedings.neurips.cc
Abstract Although Vision transformers (ViTs) have recently dominated many vision tasks,
deploying ViT models on resource-limited devices remains a challenging problem. To …

Aloft: A lightweight mlp-like architecture with dynamic low-frequency transform for domain generalization

J Guo, N Wang, L Qi, Y Shi - … of the IEEE/CVF conference on …, 2023 - openaccess.thecvf.com
Abstract Domain generalization (DG) aims to learn a model that generalizes well to unseen
target domains utilizing multiple source domains without re-training. Most existing DG works …

Towards reliable and efficient backdoor trigger inversion via decoupling benign features

X Xu, K Huang, Y Li, Z Qin, K Ren - The Twelfth International …, 2024 - openreview.net
Recent studies revealed that using third-party models may lead to backdoor threats, where
adversaries can maliciously manipulate model predictions based on backdoors implanted …

Dynamixer: a vision mlp architecture with dynamic mixing

Z Wang, W Jiang, YM Zhu, L Yuan… - … on machine learning, 2022 - proceedings.mlr.press
Recently, MLP-like vision models have achieved promising performances on mainstream
visual recognition tasks. In contrast with vision transformers and CNNs, the success of MLP …

Improving adversarial robustness of masked autoencoders via test-time frequency-domain prompting

Q Huang, X Dong, D Chen, Y Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
In this paper, we investigate the adversarial robustness of vision transformers that are
equipped with BERT pretraining (eg, BEiT, MAE). A surprising observation is that MAE has …

A general-purpose edge-feature guidance module to enhance vision transformers for plant disease identification

B Chang, Y Wang, X Zhao, G Li, P Yuan - Expert Systems with Applications, 2024 - Elsevier
As agricultural applications are specialized, plant diseases are diverse, and there is a lack of
agricultural datasets, current plant disease identification performance is inadequate. In this …