作者
Kaiyang Zhou, Jingkang Yang, Chen Change Loy, Ziwei Liu
发表日期
2022
期刊
International Journal of Computer Vision
简介
Large pre-trained vision-language models like CLIP have shown great potential in learning representations that are transferable across a wide range of downstream tasks. Different from the traditional representation learning that is based mostly on discretized labels, vision-language pre-training aligns images and texts in a common feature space, which allows zero-shot transfer to a downstream task via prompting, i.e., classification weights are synthesized from natural language describing classes of interest. In this work, we show that a major challenge for deploying such models in practice is prompt engineering, which requires domain expertise and is extremely time-consuming—one needs to spend a significant amount of time on words tuning since a slight change in wording could have a huge impact on performance. Inspired by recent advances in prompt learning research in natural language processing (NLP …
引用总数
学术搜索中的文章
K Zhou, J Yang, CC Loy, Z Liu - International Journal of Computer Vision, 2022