Nevlp: Noise-robust framework for efficient vision-language pre-training
The success of Vision Language Models (VLMs) on various vision-language tasks heavily
relies on pre-training with large scale web-crawled datasets. However, the noisy and …
relies on pre-training with large scale web-crawled datasets. However, the noisy and …