Conditional prompt learning for vision-language models

S Jiao, Y Wei, Y Wang, Y Zhao… - Advances in Neural …, 2023 - proceedings.neurips.cc

Recently, pre-trained vision-language models have been increasingly used to tackle the
challenging zero-shot segmentation task. Typical solutions follow the paradigm of first …

被引用次数：15 相关文章所有 5 个版本

[PDF] thecvf.com

Preventing zero-shot transfer degradation in continual learning of vision-language models

Z Zheng, M Ma, K Wang, Z Qin… - Proceedings of the …, 2023 - openaccess.thecvf.com

Continual learning (CL) can help pre-trained vision-language models efficiently adapt to
new or under-trained data distributions without re-training. Nevertheless, during the …

被引用次数：30 相关文章所有 5 个版本

[PDF] mlr.press

Clipood: Generalizing clip to out-of-distributions

Y Shu, X Guo, J Wu, X Wang… - … on Machine Learning, 2023 - proceedings.mlr.press

Abstract Out-of-distribution (OOD) generalization, where the model needs to handle
distribution shifts from training, is a major challenge of machine learning. Contrastive …

被引用次数：34 相关文章所有 7 个版本

[PDF] neurips.cc

Promptrestorer: A prompting image restoration method with degradation perception

C Wang, J Pan, W Wang, J Dong… - Advances in …, 2023 - proceedings.neurips.cc

We show that raw degradation features can effectively guide deep restoration models,
providing accurate degradation priors to facilitate better restoration. While networks that do …

被引用次数：12 相关文章所有 4 个版本

[PDF] thecvf.com

Waffling around for performance: Visual classification with random words and broad concepts

K Roth, JM Kim, A Koepke, O Vinyals… - Proceedings of the …, 2023 - openaccess.thecvf.com

The visual classification performance of vision-language models such as CLIP has been
shown to benefit from additional semantic knowledge from large language models (LLMs) …

被引用次数：23 相关文章所有 5 个版本

[PDF] thecvf.com

What can human sketches do for object detection?

PN Chowdhury, AK Bhunia, A Sain… - Proceedings of the …, 2023 - openaccess.thecvf.com

Sketches are highly expressive, inherently capturing subjective and fine-grained visual
cues. The exploration of such innate properties of human sketches has, however, been …

被引用次数：30 相关文章所有 6 个版本

[PDF] arxiv.org

Octopus: Embodied vision-language programmer from environmental feedback

J Yang, Y Dong, S Liu, B Li, Z Wang, C Jiang… - arXiv preprint arXiv …, 2023 - arxiv.org

Large vision-language models (VLMs) have achieved substantial progress in multimodal
perception and reasoning. Furthermore, when seamlessly integrated into an embodied …

被引用次数：27 相关文章所有 2 个版本

[PDF] thecvf.com

Flip: Cross-domain face anti-spoofing with language guidance

K Srivatsan, M Naseer… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Face anti-spoofing (FAS) or presentation attack detection is an essential component of face
recognition systems deployed in security-critical applications. Existing FAS methods have …

被引用次数：18 相关文章所有 7 个版本

[PDF] neurips.cc

Swapprompt: Test-time prompt adaptation for vision-language models

X Ma, J Zhang, S Guo, W Xu - Advances in Neural …, 2024 - proceedings.neurips.cc

Test-time adaptation (TTA) is a special and practical setting in unsupervised domain
adaptation, which allows a pre-trained model in a source domain to adapt to unlabeled test …

被引用次数：9 相关文章所有 4 个版本

[PDF] thecvf.com

Viewrefer: Grasp the multi-view knowledge for 3d visual grounding

Z Guo, Y Tang, R Zhang, D Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Understanding 3D scenes from multi-view inputs has been proven to alleviate the view
discrepancy issue in 3D visual grounding. However, existing methods normally neglect the …

被引用次数：15 相关文章所有 3 个版本