Learning to prompt for vision-language models

DW Zhou, QW Wang, ZH Qi, HJ Ye, DC Zhan… - arXiv preprint arXiv …, 2023 - arxiv.org

Deep models, eg, CNNs and Vision Transformers, have achieved impressive achievements
in many vision tasks in the closed world. However, novel classes emerge from time to time in …

被引用次数：161 相关文章所有 7 个版本

[PDF] arxiv.org

Fine-tuning can distort pretrained features and underperform out-of-distribution

A Kumar, A Raghunathan, R Jones, T Ma… - arXiv preprint arXiv …, 2022 - arxiv.org

When transferring a pretrained model to a downstream task, two popular methods are full
fine-tuning (updating all the model parameters) and linear probing (updating only the last …

被引用次数：533 相关文章所有 5 个版本

[PDF] neurips.cc

Optimizing prompts for text-to-image generation

Y Hao, Z Chi, L Dong, F Wei - Advances in Neural …, 2024 - proceedings.neurips.cc

Well-designed prompts can guide text-to-image models to generate amazing images.
However, the performant prompts are often model-specific and misaligned with user input …

被引用次数：95 相关文章所有 5 个版本

[PDF] neurips.cc

Lst: Ladder side-tuning for parameter and memory efficient transfer learning

YL Sung, J Cho, M Bansal - Advances in Neural …, 2022 - proceedings.neurips.cc

Fine-tuning large pre-trained models on downstream tasks has been adopted in a variety of
domains recently. However, it is costly to update the entire parameter set of large pre-trained …

被引用次数：156 相关文章所有 5 个版本

[PDF] thecvf.com

Robust fine-tuning of zero-shot models

M Wortsman, G Ilharco, JW Kim, M Li… - Proceedings of the …, 2022 - openaccess.thecvf.com

Large pre-trained models such as CLIP or ALIGN offer consistent accuracy across a range of
data distributions when performing zero-shot inference (ie, without fine-tuning on a specific …

被引用次数：502 相关文章所有 9 个版本

[PDF] thecvf.com

Vl-adapter: Parameter-efficient transfer learning for vision-and-language tasks

YL Sung, J Cho, M Bansal - … of the IEEE/CVF conference on …, 2022 - openaccess.thecvf.com

Recently, fine-tuning language models pre-trained on large text corpora have provided huge
improvements on vision-and-language (V&L) tasks as well as on pure language tasks …

被引用次数：292 相关文章所有 6 个版本

[PDF] arxiv.org

Is synthetic data from generative models ready for image recognition?

R He, S Sun, X Yu, C Xue, W Zhang, P Torr… - arXiv preprint arXiv …, 2022 - arxiv.org

Recent text-to-image generation models have shown promising results in generating high-
fidelity photo-realistic images. Though the results are astonishing to human eyes, how …

被引用次数：199 相关文章所有 4 个版本

[PDF] thecvf.com

Self-regulating prompts: Foundational model adaptation without forgetting

MU Khattak, ST Wasim, M Naseer… - Proceedings of the …, 2023 - openaccess.thecvf.com

Prompt learning has emerged as an efficient alternative for fine-tuning foundational models,
such as CLIP, for various downstream tasks. Conventionally trained using the task-specific …

被引用次数：67 相关文章所有 7 个版本

[PDF] arxiv.org

Prompting visual-language models for efficient video understanding

C Ju, T Han, K Zheng, Y Zhang, W Xie - European Conference on …, 2022 - Springer

Image-based visual-language (I-VL) pre-training has shown great success for learning joint
visual-textual representations from large-scale web data, revealing remarkable ability for …

被引用次数：310 相关文章所有 6 个版本

[PDF] arxiv.org

Exploring visual prompts for adapting large-scale models

H Bahng, A Jahanian, S Sankaranarayanan… - arXiv preprint arXiv …, 2022 - arxiv.org

We investigate the efficacy of visual prompting to adapt large-scale models in vision.
Following the recent approach from prompt tuning and adversarial reprogramming, we learn …

被引用次数：223 相关文章所有 2 个版本