Words Matter: Leveraging Individual Text Embeddings for Code Generation in CLIP Test-Time Adaptation

S Mishra, J Silva-Rodrıguez, IB Ayed… - arXiv preprint arXiv …, 2024 - arxiv.org
Vision-language foundation models, such as CLIP, have shown unprecedented zero-shot
performance across a wide range of tasks. Nevertheless, these models may be unreliable …

WATT: Weight Average Test-Time Adaption of CLIP

D Osowiechi, M Noori, GAV Hakim… - arXiv preprint arXiv …, 2024 - arxiv.org
Vision-Language Models (VLMs) such as CLIP have yielded unprecedented performance
for zero-shot image classification, yet their generalization capability may still be seriously …