Non-contrastive learning meets language-image pre-training
Contrastive language-image pre-training (CLIP) serves as a de-facto standard to align
images and texts. Nonetheless, the loose correlation between images and texts of web …
images and texts. Nonetheless, the loose correlation between images and texts of web …
Calip: Zero-shot enhancement of clip with parameter-free attention
Abstract Contrastive Language-Image Pre-training (CLIP) has been shown to learn visual
representations with promising zero-shot performance. To further improve its downstream …
representations with promising zero-shot performance. To further improve its downstream …
Filtering, distillation, and hard negatives for vision-language pre-training
Vision-language models trained with contrastive learning on large-scale noisy data are
becoming increasingly popular for zero-shot recognition problems. In this paper we improve …
becoming increasingly popular for zero-shot recognition problems. In this paper we improve …
Sus-x: Training-free name-only transfer of vision-language models
V Udandarao, A Gupta… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract Contrastive Language-Image Pre-training (CLIP) has emerged as a simple yet
effective way to train large-scale vision-language models. CLIP demonstrates impressive …
effective way to train large-scale vision-language models. CLIP demonstrates impressive …
Ra-clip: Retrieval augmented contrastive language-image pre-training
Abstract Contrastive Language-Image Pre-training (CLIP) is attracting increasing attention
for its impressive zero-shot recognition performance on different down-stream tasks …
for its impressive zero-shot recognition performance on different down-stream tasks …
Long-clip: Unlocking the long-text capability of clip
Abstract Contrastive Language-Image Pre-training (CLIP) has been the cornerstone for zero-
shot classification, text-image retrieval, and text-image generation by aligning image and …
shot classification, text-image retrieval, and text-image generation by aligning image and …
Supervision exists everywhere: A data efficient contrastive language-image pre-training paradigm
Recently, large-scale Contrastive Language-Image Pre-training (CLIP) has attracted
unprecedented attention for its impressive zero-shot recognition ability and excellent …
unprecedented attention for its impressive zero-shot recognition ability and excellent …
Unified contrastive learning in image-text-label space
Visual recognition is recently learned via either supervised learning on human-annotated
image-label data or language-image contrastive learning with webly-crawled image-text …
image-label data or language-image contrastive learning with webly-crawled image-text …
Chinese clip: Contrastive vision-language pretraining in chinese
The tremendous success of CLIP (Radford et al., 2021) has promoted the research and
application of contrastive learning for vision-language pretraining. In this work, we construct …
application of contrastive learning for vision-language pretraining. In this work, we construct …
Reclip: Refine contrastive language image pre-training with source free domain adaptation
Large-scale pre-training vision-language models (VLM) such as CLIP has demonstrated
outstanding performance in zero-shot classification, eg achieving 76.3% top-1 accuracy on …
outstanding performance in zero-shot classification, eg achieving 76.3% top-1 accuracy on …