Learning to decompose visual features with latent textual prompts

BXB Yu, J Chang, H Wang, L Liu, S Wang… - ACM Computing …, 2024 - dl.acm.org

Fine-tuning visual models has been widely shown promising performance on many
downstream visual tasks. With the surprising development of pre-trained visual foundation …

被引用次数：22 相关文章所有 4 个版本

[PDF] thecvf.com

Improving zero-shot generalization for clip with synthesized prompts

Z Wang, J Liang, R He, N Xu… - Proceedings of the …, 2023 - openaccess.thecvf.com

With the growing interest in pretrained vision-language models like CLIP, recent research
has focused on adapting these models to downstream tasks. Despite achieving promising …

被引用次数：31 相关文章所有 3 个版本

CT image denoising and deblurring with deep learning: current status and perspectives

Y Lei, C Niu, J Zhang, G Wang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

This article reviews the deep learning methods for computed tomography image denoising
and deblurring separately and simultaneously. Then, we discuss promising directions in this …

被引用次数：10 相关文章

[PDF] thecvf.com

BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP

J Bai, K Gao, S Min, ST Xia, Z Li… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Abstract Contrastive Vision-Language Pre-training known as CLIP has shown promising
effectiveness in addressing downstream image recognition tasks. However recent works …

被引用次数：12 相关文章所有 5 个版本

[PDF] thecvf.com

Dept: Decoupled prompt tuning

J Zhang, S Wu, L Gao, HT Shen… - Proceedings of the …, 2024 - openaccess.thecvf.com

This work breaks through the Base-New Tradeoff (BNT) dilemma in prompt tuning ie the
better the tuned model generalizes to the base (or target) task the worse it generalizes to …

被引用次数：12 相关文章所有 3 个版本

[PDF] thecvf.com

Promptkd: Unsupervised prompt distillation for vision-language models

Z Li, X Li, X Fu, X Zhang, W Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Prompt learning has emerged as a valuable technique in enhancing vision-language
models (VLMs) such as CLIP for downstream tasks in specific domains. Existing work mainly …

被引用次数：10 相关文章所有 4 个版本

[PDF] arxiv.org

Sclip: Rethinking self-attention for dense vision-language inference

F Wang, J Mei, A Yuille - arXiv preprint arXiv:2312.01597, 2023 - arxiv.org

Recent advances in contrastive language-image pretraining (CLIP) have demonstrated
strong capabilities in zero-shot classification by aligning visual representations with target …

被引用次数：17 相关文章所有 2 个版本

[PDF] thecvf.com

Learning to adapt clip for few-shot monocular depth estimation

X Hu, C Zhang, Y Zhang, B Hai… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Pre-trained Visual-Language Models (VLMs), such as CLIP, have shown enhanced
performance across a range of tasks that involve the integration of visual and linguistic …

被引用次数：9 相关文章所有 6 个版本

[PDF] arxiv.org

Black-box tuning of vision-language models with effective gradient approximation

Z Guo, Y Wei, M Liu, Z Ji, J Bai, Y Guo… - arXiv preprint arXiv …, 2023 - arxiv.org

Parameter-efficient fine-tuning (PEFT) methods have provided an effective way for adapting
large vision-language models to specific tasks or scenarios. Typically, they learn a very …

被引用次数：7 相关文章所有 5 个版本

[PDF] neurips.cc

Tuning multi-mode token-level prompt alignment across modalities

D Wang, M Li, X Liu, MS Xu… - Advances in Neural …, 2024 - proceedings.neurips.cc

Advancements in prompt tuning of vision-language models have underscored their potential
in enhancing open-world visual concept comprehension. However, prior works only …

被引用次数：7 相关文章所有 5 个版本