Improving zero-shot generalization for clip with synthesized prompts
With the growing interest in pretrained vision-language models like CLIP, recent research
has focused on adapting these models to downstream tasks. Despite achieving promising …
has focused on adapting these models to downstream tasks. Despite achieving promising …
CT image denoising and deblurring with deep learning: current status and perspectives
This article reviews the deep learning methods for computed tomography image denoising
and deblurring separately and simultaneously. Then, we discuss promising directions in this …
and deblurring separately and simultaneously. Then, we discuss promising directions in this …
BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP
Abstract Contrastive Vision-Language Pre-training known as CLIP has shown promising
effectiveness in addressing downstream image recognition tasks. However recent works …
effectiveness in addressing downstream image recognition tasks. However recent works …
Dept: Decoupled prompt tuning
This work breaks through the Base-New Tradeoff (BNT) dilemma in prompt tuning ie the
better the tuned model generalizes to the base (or target) task the worse it generalizes to …
better the tuned model generalizes to the base (or target) task the worse it generalizes to …
Promptkd: Unsupervised prompt distillation for vision-language models
Prompt learning has emerged as a valuable technique in enhancing vision-language
models (VLMs) such as CLIP for downstream tasks in specific domains. Existing work mainly …
models (VLMs) such as CLIP for downstream tasks in specific domains. Existing work mainly …
Sclip: Rethinking self-attention for dense vision-language inference
Recent advances in contrastive language-image pretraining (CLIP) have demonstrated
strong capabilities in zero-shot classification by aligning visual representations with target …
strong capabilities in zero-shot classification by aligning visual representations with target …
Learning to adapt clip for few-shot monocular depth estimation
Abstract Pre-trained Visual-Language Models (VLMs), such as CLIP, have shown enhanced
performance across a range of tasks that involve the integration of visual and linguistic …
performance across a range of tasks that involve the integration of visual and linguistic …
Black-box tuning of vision-language models with effective gradient approximation
Parameter-efficient fine-tuning (PEFT) methods have provided an effective way for adapting
large vision-language models to specific tasks or scenarios. Typically, they learn a very …
large vision-language models to specific tasks or scenarios. Typically, they learn a very …
Tuning multi-mode token-level prompt alignment across modalities
Advancements in prompt tuning of vision-language models have underscored their potential
in enhancing open-world visual concept comprehension. However, prior works only …
in enhancing open-world visual concept comprehension. However, prior works only …