Cross-modal retrieval: a systematic review of methods and future directions
With the exponential surge in diverse multi-modal data, traditional uni-modal retrieval
methods struggle to meet the needs of users seeking access to data across various …
methods struggle to meet the needs of users seeking access to data across various …
Hierarchical multi-label classification networks
One of the most challenging machine learning problems is a particular case of data
classification in which classes are hierarchically structured and objects can be assigned to …
classification in which classes are hierarchically structured and objects can be assigned to …
Cross-modal scene graph matching for relationship-aware image-text retrieval
Image-text retrieval of natural scenes has been a popular research topic. Since image and
text are heterogeneous cross-modal data, one of the key challenges is how to learn …
text are heterogeneous cross-modal data, one of the key challenges is how to learn …
Focus your attention: A bidirectional focal attention network for image-text matching
Learning semantic correspondence between image and text is significant as it bridges the
semantic gap between vision and language. The key challenge is to accurately find and …
semantic gap between vision and language. The key challenge is to accurately find and …
Gradual: Graph-based dual-modal representation for image-text matching
Image-text retrieval task is a challenging task. It aims to measure the visual-semantic
correspondence between an image and a text caption. This is tough mainly because the …
correspondence between an image and a text caption. This is tough mainly because the …
Acmm: Aligned cross-modal memory for few-shot image and sentence matching
Image and sentence matching has drawn much attention recently, but due to the lack of
sufficient pairwise data for training, most previous methods still cannot well associate those …
sufficient pairwise data for training, most previous methods still cannot well associate those …
Adaptive cross-modal embeddings for image-text alignment
Abstract a using an embedding vector of an instance from modality b. Such an adaptation is
designed to filter and enhance important information across internal features, allowing for …
designed to filter and enhance important information across internal features, allowing for …
Metric learning with horde: High-order regularizer for deep embeddings
Learning an effective similarity measure between image representations is key to the
success of recent advances in visual search tasks (eg verification or zero-shot learning) …
success of recent advances in visual search tasks (eg verification or zero-shot learning) …
Cross-modal image-text retrieval with semantic consistency
Cross-modal image-text retrieval has been a long-standing challenge in the multimedia
community. Existing methods explore various complicated embedding spaces to assess the …
community. Existing methods explore various complicated embedding spaces to assess the …
Language-agnostic visual-semantic embeddings
J Wehrmann, DM Souza, MA Lopes… - Proceedings of the …, 2019 - openaccess.thecvf.com
This paper proposes a framework for training language-invariant cross-modal retrieval
models. We also introduce a novel character-based word-embedding approach, allowing …
models. We also introduce a novel character-based word-embedding approach, allowing …