A recipe for scaling up text-to-video generation with text-free videos
Diffusion-based text-to-video generation has witnessed impressive progress in the past year
yet still falls behind text-to-image generation. One of the key reasons is the limited scale of …
yet still falls behind text-to-image generation. One of the key reasons is the limited scale of …
Plug-and-play regulators for image-text matching
Exploiting fine-grained correspondence and visual-semantic alignments has shown great
potential in image-text matching. Generally, recent approaches first employ a cross-modal …
potential in image-text matching. Generally, recent approaches first employ a cross-modal …
Enhanced semantic similarity learning framework for image-text matching
Image-text matching is a fundamental task to bridge vision and language. The critical
challenge lies in accurately learning the semantic similarity between these two …
challenge lies in accurately learning the semantic similarity between these two …
Direction-oriented visual-semantic embedding model for remote sensing image-text retrieval
Q Ma, J Pan, C Bai - IEEE Transactions on Geoscience and …, 2024 - ieeexplore.ieee.org
Image-text retrieval has developed rapidly in recent years. However, it is still a challenge in
remote sensing due to visual-semantic imbalance, which leads to incorrect matching of …
remote sensing due to visual-semantic imbalance, which leads to incorrect matching of …
3SHNet: Boosting image–sentence retrieval via visual semantic–spatial self-highlighting
In this paper, we propose a novel visual Semantic-Spatial Self-Highlighting Network (termed
3SHNet) for high-precision, high-efficiency and high-generalization image–sentence …
3SHNet) for high-precision, high-efficiency and high-generalization image–sentence …
Reservoir computing transformer for image-text retrieval
Although the attention mechanism in transformers has proven successful in image-text
retrieval tasks, most transformer models suffer from a large number of parameters. Inspired …
retrieval tasks, most transformer models suffer from a large number of parameters. Inspired …
Unlocking the Power of Cross-Dimensional Semantic Dependency for Image-Text Matching
Image-text matching, as a fundamental cross-modal task, bridges vision and language. The
key challenge lies in accurately learning the semantic similarity of these two heterogeneous …
key challenge lies in accurately learning the semantic similarity of these two heterogeneous …
Cross-Modal Semantically Augmented Network for Image-Text Matching
T Yao, Y Li, Y Li, Y Zhu, G Wang, J Yue - ACM Transactions on …, 2023 - dl.acm.org
Image-text matching plays an important role in solving the problem of cross-modal
information processing. Since there are nonnegligible semantic differences between …
information processing. Since there are nonnegligible semantic differences between …
Knowledge Proxy Intervention for Deconfounded Video Question Answering
Abstract Recently, Video Question-Answering (VideoQA) has drawn more and more
attention from both industry and research community. Despite all the success achieved by …
attention from both industry and research community. Despite all the success achieved by …
Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching
Image-text matching remains a challenging task due to heterogeneous semantic diversity
across modalities and insufficient distance separability within triplets. Different from previous …
across modalities and insufficient distance separability within triplets. Different from previous …