Probvlm: Probabilistic adapter for frozen vison-language models
Large-scale vision-language models (VLMs) like CLIP successfully find correspondences
between images and text. Through the standard deterministic mapping process, an image or …
between images and text. Through the standard deterministic mapping process, an image or …
Improved probabilistic image-text representations
S Chun - arXiv preprint arXiv:2305.18171, 2023 - arxiv.org
Image-Text Matching (ITM) task, a fundamental vision-language (VL) task, suffers from the
inherent ambiguity arising from multiplicity and imperfect annotations. Deterministic …
inherent ambiguity arising from multiplicity and imperfect annotations. Deterministic …
Prototype-based aleatoric uncertainty quantification for cross-modal retrieval
Cross-modal Retrieval methods build similarity relations between vision and language
modalities by jointly learning a common representation space. However, the predictions are …
modalities by jointly learning a common representation space. However, the predictions are …
Negative Pre-aware for Noisy Cross-Modal Matching
Cross-modal noise-robust learning is a challenging task since noisy correspondence is hard
to recognize and rectify. Due to the cumulative and unavoidable negative impact of …
to recognize and rectify. Due to the cumulative and unavoidable negative impact of …
Learning from noisy correspondence with tri-partition for cross-modal matching
Due to high labeling cost, it is inevitable to introduce a certain proportion of noisy
correspondence into visual-text datasets, resulting in poor model robustness for cross-modal …
correspondence into visual-text datasets, resulting in poor model robustness for cross-modal …
Clustering swap prediction for image-text pre-training
It is essential to delve into the strategy of multimodal model pre-training, which is an obvious
impact on downstream tasks. Currently, clustering learning has achieved noteworthy …
impact on downstream tasks. Currently, clustering learning has achieved noteworthy …
Image-text Retrieval with Main Semantics Consistency
Image-text retrieval (ITR) has been one of the primary tasks in cross-modal retrieval, serving
as a crucial bridge between computer vision and natural language processing. Significant …
as a crucial bridge between computer vision and natural language processing. Significant …
Image-text Retrieval via Preserving Main Semantics of Vision
X Zhang, X Niu, P Fournier-Viger… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
Image-text retrieval is one of the major tasks of cross-modal retrieval. Several approaches
for this task map images and texts into a common space to create correspondences between …
for this task map images and texts into a common space to create correspondences between …
Exploring the applicability of spectral recovery in semantic segmentation of RGB images
Compared with RGB images, hyperspectral images (HSIs) offer a distinct advantage in that
they can record continuous spectral bands of light reflectance in each pixel, reflecting the …
they can record continuous spectral bands of light reflectance in each pixel, reflecting the …
Multi-layer Probabilistic Association Reasoning Network for Image-Text Retrieval
With the advancement of deep learning, the task of image-text retrieval has received
widespread attention for addressing the semantic heterogeneity in multimodal data …
widespread attention for addressing the semantic heterogeneity in multimodal data …