Learning transferable visual models from natural language supervision
State-of-the-art computer vision systems are trained to predict a fixed set of predetermined
object categories. This restricted form of supervision limits their generality and usability since …
object categories. This restricted form of supervision limits their generality and usability since …
K-lite: Learning transferable visual models with external knowledge
The new generation of state-of-the-art computer vision systems are trained from natural
language supervision, ranging from simple object category names to descriptive captions …
language supervision, ranging from simple object category names to descriptive captions …
Elevater: A benchmark and toolkit for evaluating language-augmented visual models
Learning visual representations from natural language supervision has recently shown great
promise in a number of pioneering works. In general, these language-augmented visual …
promise in a number of pioneering works. In general, these language-augmented visual …
Sus-x: Training-free name-only transfer of vision-language models
V Udandarao, A Gupta… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract Contrastive Language-Image Pre-training (CLIP) has emerged as a simple yet
effective way to train large-scale vision-language models. CLIP demonstrates impressive …
effective way to train large-scale vision-language models. CLIP demonstrates impressive …
Generative pretraining from pixels
Inspired by progress in unsupervised representation learning for natural language, we
examine whether similar models can learn useful representations for images. We train a …
examine whether similar models can learn useful representations for images. We train a …
Learning vision from models rivals learning vision from data
We introduce SynCLR a novel approach for learning visual representations exclusively from
synthetic images without any real data. We synthesize a large dataset of image captions …
synthetic images without any real data. We synthesize a large dataset of image captions …
Learning to decompose visual features with latent textual prompts
Recent advances in pre-training vision-language models like CLIP have shown great
potential in learning transferable visual representations. Nonetheless, for downstream …
potential in learning transferable visual representations. Nonetheless, for downstream …
Stablerep: Synthetic images from text-to-image models make strong visual representation learners
We investigate the potential of learning visual representations using synthetic images
generated by text-to-image models. This is a natural question in the light of the excellent …
generated by text-to-image models. This is a natural question in the light of the excellent …
Vila: On pre-training for visual language models
Visual language models (VLMs) rapidly progressed with the recent success of large
language models. There have been growing efforts on visual instruction tuning to extend the …
language models. There have been growing efforts on visual instruction tuning to extend the …
Sus-x: Training-free name-only transfer of vision-language models
V Udandarao, A Gupta, S Albanie - arXiv preprint arXiv:2211.16198, 2022 - arxiv.org
Contrastive Language-Image Pre-training (CLIP) has emerged as a simple yet effective way
to train large-scale vision-language models. CLIP demonstrates impressive zero-shot …
to train large-scale vision-language models. CLIP demonstrates impressive zero-shot …