Channel augmented joint learning for visible-infrared recognition
This paper introduces a powerful channel augmented joint learning strategy for the visible-
infrared recognition problem. For data augmentation, most existing methods directly adopt …
infrared recognition problem. For data augmentation, most existing methods directly adopt …
Negative-aware attention framework for image-text matching
Image-text matching, as a fundamental task, bridges the gap between vision and language.
The key of this task is to accurately measure similarity between these two modalities. Prior …
The key of this task is to accurately measure similarity between these two modalities. Prior …
Partially view-aligned representation learning with noise-robust contrastive loss
In real-world applications, it is common that only a portion of data is aligned across views
due to spatial, temporal, or spatiotemporal asynchronism, thus leading to the so-called …
due to spatial, temporal, or spatiotemporal asynchronism, thus leading to the so-called …
Image-text embedding learning via visual and textual semantic reasoning
As a bridge between language and vision domains, cross-modal retrieval between images
and texts is a hot research topic in recent years. It remains challenging because the current …
and texts is a hot research topic in recent years. It remains challenging because the current …
Universal weighting metric learning for cross-modal retrieval
Cross-modal retrieval has recently attracted growing attention, which aims to match
instances captured from different modalities. The performance of cross-modal retrieval …
instances captured from different modalities. The performance of cross-modal retrieval …
Improving cross-modal retrieval with set of diverse embeddings
Cross-modal retrieval across image and text modalities is a challenging task due to its
inherent ambiguity: An image often exhibits various situations, and a caption can be coupled …
inherent ambiguity: An image often exhibits various situations, and a caption can be coupled …
On metric learning for audio-text cross-modal retrieval
Audio-text retrieval aims at retrieving a target audio clip or caption from a pool of candidates
given a query in another modality. Solving such cross-modal retrieval task is challenging …
given a query in another modality. Solving such cross-modal retrieval task is challenging …
Channel augmentation for visible-infrared re-identification
This paper introduces a simple yet powerful channel augmentation for visible-infrared re-
identification. Most existing augmentation operations designed for single-modality visible …
identification. Most existing augmentation operations designed for single-modality visible …
Multilateral semantic relations modeling for image text retrieval
Z Wang, Z Gao, K Guo, Y Yang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Image-text retrieval is a fundamental task to bridge vision and language by exploiting
various strategies to fine-grained alignment between regions and words. This is still tough …
various strategies to fine-grained alignment between regions and words. This is still tough …
Coder: Coupled diversity-sensitive momentum contrastive learning for image-text retrieval
Abstract Image-Text Retrieval (ITR) is challenging in bridging visual and lingual modalities.
Contrastive learning has been adopted by most prior arts. Except for limited amount of …
Contrastive learning has been adopted by most prior arts. Except for limited amount of …