Visual tuning

BXB Yu, J Chang, H Wang, L Liu, S Wang… - ACM Computing …, 2024 - dl.acm.org
Fine-tuning visual models has been widely shown promising performance on many
downstream visual tasks. With the surprising development of pre-trained visual foundation …

Improving zero-shot generalization for clip with synthesized prompts

Z Wang, J Liang, R He, N Xu… - Proceedings of the …, 2023 - openaccess.thecvf.com
With the growing interest in pretrained vision-language models like CLIP, recent research
has focused on adapting these models to downstream tasks. Despite achieving promising …

CT image denoising and deblurring with deep learning: current status and perspectives

Y Lei, C Niu, J Zhang, G Wang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
This article reviews the deep learning methods for computed tomography image denoising
and deblurring separately and simultaneously. Then, we discuss promising directions in this …

BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP

J Bai, K Gao, S Min, ST Xia, Z Li… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract Contrastive Vision-Language Pre-training known as CLIP has shown promising
effectiveness in addressing downstream image recognition tasks. However recent works …

Dept: Decoupled prompt tuning

J Zhang, S Wu, L Gao, HT Shen… - Proceedings of the …, 2024 - openaccess.thecvf.com
This work breaks through the Base-New Tradeoff (BNT) dilemma in prompt tuning ie the
better the tuned model generalizes to the base (or target) task the worse it generalizes to …

Promptkd: Unsupervised prompt distillation for vision-language models

Z Li, X Li, X Fu, X Zhang, W Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Prompt learning has emerged as a valuable technique in enhancing vision-language
models (VLMs) such as CLIP for downstream tasks in specific domains. Existing work mainly …

Sclip: Rethinking self-attention for dense vision-language inference

F Wang, J Mei, A Yuille - arXiv preprint arXiv:2312.01597, 2023 - arxiv.org
Recent advances in contrastive language-image pretraining (CLIP) have demonstrated
strong capabilities in zero-shot classification by aligning visual representations with target …

Learning to adapt clip for few-shot monocular depth estimation

X Hu, C Zhang, Y Zhang, B Hai… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Pre-trained Visual-Language Models (VLMs), such as CLIP, have shown enhanced
performance across a range of tasks that involve the integration of visual and linguistic …

Black-box tuning of vision-language models with effective gradient approximation

Z Guo, Y Wei, M Liu, Z Ji, J Bai, Y Guo… - arXiv preprint arXiv …, 2023 - arxiv.org
Parameter-efficient fine-tuning (PEFT) methods have provided an effective way for adapting
large vision-language models to specific tasks or scenarios. Typically, they learn a very …

Tuning multi-mode token-level prompt alignment across modalities

D Wang, M Li, X Liu, MS Xu… - Advances in Neural …, 2024 - proceedings.neurips.cc
Advancements in prompt tuning of vision-language models have underscored their potential
in enhancing open-world visual concept comprehension. However, prior works only …