Nevlp: Noise-robust framework for efficient vision-language pre-training

Y Tao, Z Wang, H Zhang, L Wang - arXiv preprint arXiv:2409.09582, 2024 - arxiv.org
The success of Vision Language Models (VLMs) on various vision-language tasks heavily
relies on pre-training with large scale web-crawled datasets. However, the noisy and …