Learning Trimodal Relation for Audio-Visual Question Answering with Missing Modality
Abstract Recent Audio-Visual Question Answering (AVQA) methods rely on complete visual
and audio input to answer questions accurately. However, in real-world scenarios, issues …
and audio input to answer questions accurately. However, in real-world scenarios, issues …
Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge
The goal of the multi-sound source localization task is to localize sound sources from the
mixture individually. While recent multi-sound source localization methods have shown …
mixture individually. While recent multi-sound source localization methods have shown …
Enhancing Audio-Visual Question Answering with Missing Modality via Trans-Modal Associative Learning
We present a novel method for Audio-Visual Question Answering (AVQA) in real-world
scenarios where one modality (audio or visual) can be missing. Inspired by human cognitive …
scenarios where one modality (audio or visual) can be missing. Inspired by human cognitive …