Mmg-ego4d: Multimodal generalization in egocentric action recognition

X Gong, S Mohan, N Dhingra… - Proceedings of the …, 2023 - openaccess.thecvf.com
In this paper, we study a novel problem in egocentric action recognition, which we term as"
Multimodal Generalization"(MMG). MMG aims to study how systems can generalize when …

FELGA: Unsupervised Fragment Embedding for Fine-Grained Cross-Modal Association

Y Zhuo, B Li - Proceedings of the IEEE/CVF Winter …, 2024 - openaccess.thecvf.com
Abstract Vision-and-Language Pre-trained (VLP) models have demonstrated their powerful
zero-shot ability in multiple downstream tasks. Most of these models are designed to learn …

Sound of Story: Multi-modal Storytelling with Audio

J Bae, S Jeong, S Kang, N Han, JY Lee, H Kim… - arXiv preprint arXiv …, 2023 - arxiv.org
Storytelling is multi-modal in the real world. When one tells a story, one may use all of the
visualizations and sounds along with the story itself. However, prior studies on storytelling …

Novel Deep Learning Algorithms for Enhancing Inference in Cross-Modal Applications

Y Zhuo - 2024 - keep.lib.asu.edu
With the exponential growth of multi-modal data in the field of computer vision, the ability to
do inference effectively among multiple modalities—such as visual, textual, and auditory …

Sound of Story: Multi-modal Storytelling with Audio

BAE Jaeyeon, S Jeong, S Kang, N Han, JY Lee… - The 2023 Conference … - openreview.net
Storytelling is multi-modal in the real world. When one tells a story, one may use all of the
visualizations and sounds along with the story itself. However, prior studies on storytelling …