Cross-modal retrieval: a systematic review of methods and future directions

T Wang, F Li, L Zhu, J Li, Z Zhang, HT Shen - arXiv preprint arXiv …, 2023 - arxiv.org
With the exponential surge in diverse multi-modal data, traditional uni-modal retrieval
methods struggle to meet the needs of users seeking access to data across various …

Hierarchical multi-label classification networks

J Wehrmann, R Cerri, R Barros - … conference on machine …, 2018 - proceedings.mlr.press
One of the most challenging machine learning problems is a particular case of data
classification in which classes are hierarchically structured and objects can be assigned to …

Cross-modal scene graph matching for relationship-aware image-text retrieval

S Wang, R Wang, Z Yao, S Shan… - Proceedings of the …, 2020 - openaccess.thecvf.com
Image-text retrieval of natural scenes has been a popular research topic. Since image and
text are heterogeneous cross-modal data, one of the key challenges is how to learn …

Focus your attention: A bidirectional focal attention network for image-text matching

C Liu, Z Mao, AA Liu, T Zhang, B Wang… - Proceedings of the 27th …, 2019 - dl.acm.org
Learning semantic correspondence between image and text is significant as it bridges the
semantic gap between vision and language. The key challenge is to accurately find and …

Gradual: Graph-based dual-modal representation for image-text matching

S Long, SC Han, X Wan, J Poon - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Image-text retrieval task is a challenging task. It aims to measure the visual-semantic
correspondence between an image and a text caption. This is tough mainly because the …

Acmm: Aligned cross-modal memory for few-shot image and sentence matching

Y Huang, L Wang - Proceedings of the IEEE/CVF …, 2019 - openaccess.thecvf.com
Image and sentence matching has drawn much attention recently, but due to the lack of
sufficient pairwise data for training, most previous methods still cannot well associate those …

Adaptive cross-modal embeddings for image-text alignment

J Wehrmann, C Kolling, RC Barros - … of the AAAI conference on artificial …, 2020 - ojs.aaai.org
Abstract a using an embedding vector of an instance from modality b. Such an adaptation is
designed to filter and enhance important information across internal features, allowing for …

Metric learning with horde: High-order regularizer for deep embeddings

P Jacob, D Picard, A Histace… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
Learning an effective similarity measure between image representations is key to the
success of recent advances in visual search tasks (eg verification or zero-shot learning) …

Cross-modal image-text retrieval with semantic consistency

H Chen, G Ding, Z Lin, S Zhao, J Han - Proceedings of the 27th ACM …, 2019 - dl.acm.org
Cross-modal image-text retrieval has been a long-standing challenge in the multimedia
community. Existing methods explore various complicated embedding spaces to assess the …

Language-agnostic visual-semantic embeddings

J Wehrmann, DM Souza, MA Lopes… - Proceedings of the …, 2019 - openaccess.thecvf.com
This paper proposes a framework for training language-invariant cross-modal retrieval
models. We also introduce a novel character-based word-embedding approach, allowing …