查看文章

arxiv.org 中的 [PDF]

Learning to Prompt for Vision-Language Models

作者

Kaiyang Zhou, Jingkang Yang, Chen Change Loy, Ziwei Liu

发表日期

2022

期刊

International Journal of Computer Vision

简介

Large pre-trained vision-language models like CLIP have shown great potential in learning representations that are transferable across a wide range of downstream tasks. Different from the traditional representation learning that is based mostly on discretized labels, vision-language pre-training aligns images and texts in a common feature space, which allows zero-shot transfer to a downstream task via prompting, i.e., classification weights are synthesized from natural language describing classes of interest. In this work, we show that a major challenge for deploying such models in practice is prompt engineering, which requires domain expertise and is extremely time-consuming—one needs to spend a significant amount of time on words tuning since a slight change in wording could have a huge impact on performance. Inspired by recent advances in prompt learning research in natural language processing (NLP …

引用总数

被引用次数：1498

20212022202320246 153 710 626

学术搜索中的文章

Learning to prompt for vision-language models

K Zhou, J Yang, CC Loy, Z Liu - International Journal of Computer Vision, 2022

被引用次数：1498 相关文章所有 10 个版本