Probvlm: Probabilistic adapter for frozen vison-language models

U Upadhyay, S Karthik, M Mancini… - Proceedings of the …, 2023 - openaccess.thecvf.com
Large-scale vision-language models (VLMs) like CLIP successfully find correspondences
between images and text. Through the standard deterministic mapping process, an image or …

Improved probabilistic image-text representations

S Chun - arXiv preprint arXiv:2305.18171, 2023 - arxiv.org
Image-Text Matching (ITM) task, a fundamental vision-language (VL) task, suffers from the
inherent ambiguity arising from multiplicity and imperfect annotations. Deterministic …

Prototype-based aleatoric uncertainty quantification for cross-modal retrieval

H Li, J Song, L Gao, X Zhu… - Advances in Neural …, 2024 - proceedings.neurips.cc
Cross-modal Retrieval methods build similarity relations between vision and language
modalities by jointly learning a common representation space. However, the predictions are …

Negative Pre-aware for Noisy Cross-Modal Matching

X Zhang, H Li, M Ye - Proceedings of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org
Cross-modal noise-robust learning is a challenging task since noisy correspondence is hard
to recognize and rectify. Due to the cumulative and unavoidable negative impact of …

Learning from noisy correspondence with tri-partition for cross-modal matching

Z Feng, Z Zeng, C Guo, Z Li, L Hu - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Due to high labeling cost, it is inevitable to introduce a certain proportion of noisy
correspondence into visual-text datasets, resulting in poor model robustness for cross-modal …

Clustering swap prediction for image-text pre-training

S Fayou, HC Ngo, YW Sek, Z Meng - Scientific Reports, 2024 - nature.com
It is essential to delve into the strategy of multimodal model pre-training, which is an obvious
impact on downstream tasks. Currently, clustering learning has achieved noteworthy …

Image-text Retrieval with Main Semantics Consistency

Y Xie, Y Wang, Y Xie, X Tan, J Li, X Li, W Peng… - Proceedings of the 33rd …, 2024 - dl.acm.org
Image-text retrieval (ITR) has been one of the primary tasks in cross-modal retrieval, serving
as a crucial bridge between computer vision and natural language processing. Significant …

Image-text Retrieval via Preserving Main Semantics of Vision

X Zhang, X Niu, P Fournier-Viger… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
Image-text retrieval is one of the major tasks of cross-modal retrieval. Several approaches
for this task map images and texts into a common space to create correspondences between …

Exploring the applicability of spectral recovery in semantic segmentation of RGB images

Z Du, S Wei, T Liu, S Zhang, X Chen… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
Compared with RGB images, hyperspectral images (HSIs) offer a distinct advantage in that
they can record continuous spectral bands of light reflectance in each pixel, reflecting the …

Multi-layer Probabilistic Association Reasoning Network for Image-Text Retrieval

W Li, R Xiong, X Fan - … on Circuits and Systems for Video …, 2024 - ieeexplore.ieee.org
With the advancement of deep learning, the task of image-text retrieval has received
widespread attention for addressing the semantic heterogeneity in multimodal data …