相关文章- 学术资源搜索

Learning transferable visual models from natural language supervision

A Radford, JW Kim, C Hallacy… - International …, 2021 - proceedings.mlr.press

State-of-the-art computer vision systems are trained to predict a fixed set of predetermined
object categories. This restricted form of supervision limits their generality and usability since …

被引用次数：17348 相关文章所有 19 个版本

[PDF] neurips.cc

K-lite: Learning transferable visual models with external knowledge

S Shen, C Li, X Hu, Y Xie, J Yang… - Advances in …, 2022 - proceedings.neurips.cc

The new generation of state-of-the-art computer vision systems are trained from natural
language supervision, ranging from simple object category names to descriptive captions …

被引用次数：69 相关文章所有 5 个版本

[PDF] neurips.cc

Elevater: A benchmark and toolkit for evaluating language-augmented visual models

C Li, H Liu, L Li, P Zhang, J Aneja… - Advances in …, 2022 - proceedings.neurips.cc

Learning visual representations from natural language supervision has recently shown great
promise in a number of pioneering works. In general, these language-augmented visual …

被引用次数：102 相关文章所有 8 个版本

[PDF] thecvf.com

Sus-x: Training-free name-only transfer of vision-language models

V Udandarao, A Gupta… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Abstract Contrastive Language-Image Pre-training (CLIP) has emerged as a simple yet
effective way to train large-scale vision-language models. CLIP demonstrates impressive …

被引用次数：23 相关文章所有 3 个版本

[PDF] mlr.press

Generative pretraining from pixels

M Chen, A Radford, R Child, J Wu… - International …, 2020 - proceedings.mlr.press

Inspired by progress in unsupervised representation learning for natural language, we
examine whether similar models can learn useful representations for images. We train a …

被引用次数：1524 相关文章所有 14 个版本

[PDF] thecvf.com

Learning vision from models rivals learning vision from data

Y Tian, L Fan, K Chen, D Katabi… - Proceedings of the …, 2024 - openaccess.thecvf.com

We introduce SynCLR a novel approach for learning visual representations exclusively from
synthetic images without any real data. We synthesize a large dataset of image captions …

被引用次数：14 相关文章所有 2 个版本

[PDF] arxiv.org

Learning to decompose visual features with latent textual prompts

F Wang, M Li, X Lin, H Lv, AG Schwing, H Ji - arXiv preprint arXiv …, 2022 - arxiv.org

Recent advances in pre-training vision-language models like CLIP have shown great
potential in learning transferable visual representations. Nonetheless, for downstream …

被引用次数：20 相关文章所有 4 个版本

[PDF] neurips.cc

Stablerep: Synthetic images from text-to-image models make strong visual representation learners

Y Tian, L Fan, P Isola, H Chang… - Advances in Neural …, 2024 - proceedings.neurips.cc

We investigate the potential of learning visual representations using synthetic images
generated by text-to-image models. This is a natural question in the light of the excellent …

被引用次数：56 相关文章所有 4 个版本

[PDF] thecvf.com

Vila: On pre-training for visual language models

J Lin, H Yin, W Ping, P Molchanov… - Proceedings of the …, 2024 - openaccess.thecvf.com

Visual language models (VLMs) rapidly progressed with the recent success of large
language models. There have been growing efforts on visual instruction tuning to extend the …

被引用次数：37 相关文章所有 3 个版本

[PDF] arxiv.org

Sus-x: Training-free name-only transfer of vision-language models

V Udandarao, A Gupta, S Albanie - arXiv preprint arXiv:2211.16198, 2022 - arxiv.org

Contrastive Language-Image Pre-training (CLIP) has emerged as a simple yet effective way
to train large-scale vision-language models. CLIP demonstrates impressive zero-shot …

被引用次数：30 相关文章所有 2 个版本

Learning transferable visual models from natural language supervision

K-lite: Learning transferable visual models with external knowledge

Elevater: A benchmark and toolkit for evaluating language-augmented visual models

Sus-x: Training-free name-only transfer of vision-language models

Generative pretraining from pixels

Learning vision from models rivals learning vision from data

Learning to decompose visual features with latent textual prompts

Stablerep: Synthetic images from text-to-image models make strong visual representation learners

Vila: On pre-training for visual language models

Sus-x: Training-free name-only transfer of vision-language models

相关搜索

高级搜索

引用