Dissecting deep metric learning losses for image-text retrieval

X Gong, S Mohan, N Dhingra… - Proceedings of the …, 2023 - openaccess.thecvf.com

In this paper, we study a novel problem in egocentric action recognition, which we term as"
Multimodal Generalization"(MMG). MMG aims to study how systems can generalize when …

被引用次数：12 相关文章所有 6 个版本

[PDF] thecvf.com

FELGA: Unsupervised Fragment Embedding for Fine-Grained Cross-Modal Association

Y Zhuo, B Li - Proceedings of the IEEE/CVF Winter …, 2024 - openaccess.thecvf.com

Abstract Vision-and-Language Pre-trained (VLP) models have demonstrated their powerful
zero-shot ability in multiple downstream tasks. Most of these models are designed to learn …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Sound of Story: Multi-modal Storytelling with Audio

J Bae, S Jeong, S Kang, N Han, JY Lee, H Kim… - arXiv preprint arXiv …, 2023 - arxiv.org

Storytelling is multi-modal in the real world. When one tells a story, one may use all of the
visualizations and sounds along with the story itself. However, prior studies on storytelling …

被引用次数：1 相关文章所有 4 个版本

[PDF] asu.edu

Novel Deep Learning Algorithms for Enhancing Inference in Cross-Modal Applications

Y Zhuo - 2024 - keep.lib.asu.edu

With the exponential growth of multi-modal data in the field of computer vision, the ability to
do inference effectively among multiple modalities—such as visual, textual, and auditory …

[PDF] openreview.net

Sound of Story: Multi-modal Storytelling with Audio

BAE Jaeyeon, S Jeong, S Kang, N Han, JY Lee… - The 2023 Conference … - openreview.net

Storytelling is multi-modal in the real world. When one tells a story, one may use all of the
visualizations and sounds along with the story itself. However, prior studies on storytelling …