Learning semantic relationship among instances for image-text matching

Z Fu, Z Mao, Y Song, Y Zhang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Image-text matching, a bridge connecting image and language, is an important task, which
generally learns a holistic cross-modal embedding to achieve a high-quality semantic …

Reservoir computing transformer for image-text retrieval

W Li, Z Ma, LJ Deng, P Wang, J Shi, X Fan - Proceedings of the 31st …, 2023 - dl.acm.org
Although the attention mechanism in transformers has proven successful in image-text
retrieval tasks, most transformer models suffer from a large number of parameters. Inspired …

SEMScene: Semantic-consistency enhanced multi-level scene graph matching for image-text retrieval

Y Liu, X Yuan, H Li, Z Tan, J Huang, J Xiao… - ACM Transactions on …, 2024 - dl.acm.org
Image-text retrieval, a fundamental cross-modal task, performs similarity reasoning for
images and texts. The primary challenge for image-text retrieval is cross-modal semantic …

Metasql: A generate-then-rank framework for natural language to sql translation

Y Fan, Z He, T Ren, C Huang, Y Jing, K Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
The Natural Language Interface to Databases (NLIDB) empowers non-technical users with
database access through intuitive natural language (NL) interactions. Advanced …

Cross-modal independent matching network for image-text retrieval

X Ke, B Chen, X Yang, Y Cai, H Liu, W Guo - Pattern Recognition, 2025 - Elsevier
Image-text retrieval serves as a bridge connecting vision and language. Mainstream modal
cross matching methods can effectively perform cross-modal interactions with high …

DCL-net: Dual-level correlation learning network for image–text retrieval

Z Liu, A Li, J Xu, D Shi - Computers and Electrical Engineering, 2025 - Elsevier
Due to the inconsistency in feature representations between different modalities, known as
the “Heterogeneous gap”, image–text retrieval (ITR) is a challenging task. To bridge this …

Negative sample is negative in its own way: Tailoring negative sentences for image-text retrieval

Z Fan, Z Wei, Z Li, S Wang, J Fan - arXiv preprint arXiv:2111.03349, 2021 - arxiv.org
Matching model is essential for Image-Text Retrieval framework. Existing research usually
train the model with a triplet loss and explore various strategy to retrieve hard negative …

A unified continuous learning framework for multi-modal knowledge discovery and pre-training

Z Fan, Z Wei, J Chen, S Wang, Z Li, J Xu… - arXiv preprint arXiv …, 2022 - arxiv.org
Multi-modal pre-training and knowledge discovery are two important research topics in multi-
modal machine learning. Nevertheless, none of existing works make attempts to link …

Joint Intra & Inter-Grained Reasoning: A New Look Into Semantic Consistency of Image-Text Retrieval

R Pan, H Yang, C Li, J Yang - IEEE Transactions on Multimedia, 2023 - ieeexplore.ieee.org
Multimodal understanding aims at constructing semantic correlations among modalities of
data while performing various downstream tasks. As one of the primary multimodal …

Improving Image-Text Matching by Integrating Word Sense Disambiguation

X Pu, P Yang, L Yuan, X Gao - IEEE Signal Processing Letters, 2024 - ieeexplore.ieee.org
This letter presents a novel approach to enhance image-text matching by incorporating word
sense disambiguation (WSD) within the text encoder. Our method explicitly models the …