Universal weighting metric learning for cross-modal matching

M Ye, W Ruan, B Du, MZ Shou - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

This paper introduces a powerful channel augmented joint learning strategy for the visible-
infrared recognition problem. For data augmentation, most existing methods directly adopt …

被引用次数：258 相关文章所有 4 个版本

[PDF] thecvf.com

Negative-aware attention framework for image-text matching

K Zhang, Z Mao, Q Wang… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

Image-text matching, as a fundamental task, bridges the gap between vision and language.
The key of this task is to accurately measure similarity between these two modalities. Prior …

被引用次数：136 相关文章所有 4 个版本

[PDF] thecvf.com

Partially view-aligned representation learning with noise-robust contrastive loss

M Yang, Y Li, Z Huang, Z Liu, P Hu… - Proceedings of the …, 2021 - openaccess.thecvf.com

In real-world applications, it is common that only a portion of data is aligned across views
due to spatial, temporal, or spatiotemporal asynchronism, thus leading to the so-called …

被引用次数：167 相关文章所有 5 个版本

[PDF] google.com

Image-text embedding learning via visual and textual semantic reasoning

K Li, Y Zhang, K Li, Y Li, Y Fu - IEEE transactions on pattern …, 2022 - ieeexplore.ieee.org

As a bridge between language and vision domains, cross-modal retrieval between images
and texts is a hot research topic in recent years. It remains challenging because the current …

被引用次数：90 相关文章所有 6 个版本

Universal weighting metric learning for cross-modal retrieval

J Wei, Y Yang, X Xu, X Zhu… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

Cross-modal retrieval has recently attracted growing attention, which aims to match
instances captured from different modalities. The performance of cross-modal retrieval …

被引用次数：88 相关文章所有 5 个版本

[PDF] thecvf.com

Improving cross-modal retrieval with set of diverse embeddings

D Kim, N Kim, S Kwak - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com

Cross-modal retrieval across image and text modalities is a challenging task due to its
inherent ambiguity: An image often exhibits various situations, and a caption can be coupled …

被引用次数：37 相关文章所有 7 个版本

[PDF] arxiv.org

On metric learning for audio-text cross-modal retrieval

X Mei, X Liu, J Sun, MD Plumbley, W Wang - arXiv preprint arXiv …, 2022 - arxiv.org

Audio-text retrieval aims at retrieving a target audio clip or caption from a pool of candidates
given a query in another modality. Solving such cross-modal retrieval task is challenging …

被引用次数：72 相关文章所有 9 个版本

Channel augmentation for visible-infrared re-identification

M Ye, Z Wu, C Chen, B Du - IEEE Transactions on Pattern …, 2023 - ieeexplore.ieee.org

This paper introduces a simple yet powerful channel augmentation for visible-infrared re-
identification. Most existing augmentation operations designed for single-modality visible …

被引用次数：32 相关文章所有 6 个版本

[PDF] thecvf.com

Multilateral semantic relations modeling for image text retrieval

Z Wang, Z Gao, K Guo, Y Yang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Image-text retrieval is a fundamental task to bridge vision and language by exploiting
various strategies to fine-grained alignment between regions and words. This is still tough …

被引用次数：24 相关文章所有 5 个版本

[PDF] arxiv.org

Coder: Coupled diversity-sensitive momentum contrastive learning for image-text retrieval

H Wang, D He, W Wu, B Xia, M Yang, F Li, Y Yu… - … on Computer Vision, 2022 - Springer

Abstract Image-Text Retrieval (ITR) is challenging in bridging visual and lingual modalities.
Contrastive learning has been adopted by most prior arts. Except for limited amount of …

被引用次数：30 相关文章所有 5 个版本