Fine-grained image-text matching by cross-modal hard aligning network
Z Pan, F Wu, B Zhang - … of the IEEE/CVF conference on …, 2023 - openaccess.thecvf.com
Current state-of-the-art image-text matching methods implicitly align the visual-semantic
fragments, like regions in images and words in sentences, and adopt cross-attention …
fragments, like regions in images and words in sentences, and adopt cross-attention …
Learning semantic relationship among instances for image-text matching
Image-text matching, a bridge connecting image and language, is an important task, which
generally learns a holistic cross-modal embedding to achieve a high-quality semantic …
generally learns a holistic cross-modal embedding to achieve a high-quality semantic …
Cross-modal active complementary learning with self-refining correspondence
Recently, image-text matching has attracted more and more attention from academia and
industry, which is fundamental to understanding the latent correspondence across visual …
industry, which is fundamental to understanding the latent correspondence across visual …
Cross-modal semantic enhanced interaction for image-sentence retrieval
Image-sentence retrieval has attracted extensive research attention in multimedia and
computer vision due to its promising application. The key issue lies in jointly learning the …
computer vision due to its promising application. The key issue lies in jointly learning the …
Cross-Modal Retrieval: A Review of Methodologies, Datasets, and Future Perspectives
Z Han, A Azman, MR Mustaffa, FB Khalid - IEEE Access, 2024 - ieeexplore.ieee.org
With the rapid development of science and technology, all types of mixed media contain
large amounts of data. Traditional single multimedia data can no longer satisfy daily …
large amounts of data. Traditional single multimedia data can no longer satisfy daily …
MKVSE: Multimodal knowledge enhanced visual-semantic embedding for image-text retrieval
Image-text retrieval aims to take the text (image) query to retrieve the semantically relevant
images (texts), which is fundamental and critical in the search system, online shopping, and …
images (texts), which is fundamental and critical in the search system, online shopping, and …
Efficient token-guided image-text retrieval with consistent multimodal contrastive training
Image-text retrieval is a central problem for understanding the semantic relationship
between vision and language, and serves as the basis for various visual and language …
between vision and language, and serves as the basis for various visual and language …
Neuron-based spiking transmission and reasoning network for robust image-text retrieval
Most of the image-text retrieval methods carry out accurate results using fine-grained
features for feature alignment. However, extracting the robustness features while …
features for feature alignment. However, extracting the robustness features while …
Breaking Through the Noisy Correspondence: A Robust Model for Image-Text Matching
Unleashing the power of image-text matching in real-world applications is hampered by
noisy correspondence. Manually curating high-quality datasets is expensive and time …
noisy correspondence. Manually curating high-quality datasets is expensive and time …
Amc: Adaptive multi-expert collaborative network for text-guided image retrieval
Text-guided image retrieval integrates reference image and text feedback as a multimodal
query to search the image corresponding to user intention. Recent approaches employ multi …
query to search the image corresponding to user intention. Recent approaches employ multi …