Channel augmented joint learning for visible-infrared recognition

M Ye, W Ruan, B Du, MZ Shou - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
This paper introduces a powerful channel augmented joint learning strategy for the visible-
infrared recognition problem. For data augmentation, most existing methods directly adopt …

Negative-aware attention framework for image-text matching

K Zhang, Z Mao, Q Wang… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Image-text matching, as a fundamental task, bridges the gap between vision and language.
The key of this task is to accurately measure similarity between these two modalities. Prior …

Partially view-aligned representation learning with noise-robust contrastive loss

M Yang, Y Li, Z Huang, Z Liu, P Hu… - Proceedings of the …, 2021 - openaccess.thecvf.com
In real-world applications, it is common that only a portion of data is aligned across views
due to spatial, temporal, or spatiotemporal asynchronism, thus leading to the so-called …

Image-text embedding learning via visual and textual semantic reasoning

K Li, Y Zhang, K Li, Y Li, Y Fu - IEEE transactions on pattern …, 2022 - ieeexplore.ieee.org
As a bridge between language and vision domains, cross-modal retrieval between images
and texts is a hot research topic in recent years. It remains challenging because the current …

Universal weighting metric learning for cross-modal retrieval

J Wei, Y Yang, X Xu, X Zhu… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Cross-modal retrieval has recently attracted growing attention, which aims to match
instances captured from different modalities. The performance of cross-modal retrieval …

Improving cross-modal retrieval with set of diverse embeddings

D Kim, N Kim, S Kwak - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Cross-modal retrieval across image and text modalities is a challenging task due to its
inherent ambiguity: An image often exhibits various situations, and a caption can be coupled …

On metric learning for audio-text cross-modal retrieval

X Mei, X Liu, J Sun, MD Plumbley, W Wang - arXiv preprint arXiv …, 2022 - arxiv.org
Audio-text retrieval aims at retrieving a target audio clip or caption from a pool of candidates
given a query in another modality. Solving such cross-modal retrieval task is challenging …

Channel augmentation for visible-infrared re-identification

M Ye, Z Wu, C Chen, B Du - IEEE Transactions on Pattern …, 2023 - ieeexplore.ieee.org
This paper introduces a simple yet powerful channel augmentation for visible-infrared re-
identification. Most existing augmentation operations designed for single-modality visible …

Multilateral semantic relations modeling for image text retrieval

Z Wang, Z Gao, K Guo, Y Yang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Image-text retrieval is a fundamental task to bridge vision and language by exploiting
various strategies to fine-grained alignment between regions and words. This is still tough …

Coder: Coupled diversity-sensitive momentum contrastive learning for image-text retrieval

H Wang, D He, W Wu, B Xia, M Yang, F Li, Y Yu… - … on Computer Vision, 2022 - Springer
Abstract Image-Text Retrieval (ITR) is challenging in bridging visual and lingual modalities.
Contrastive learning has been adopted by most prior arts. Except for limited amount of …