Learning Trimodal Relation for Audio-Visual Question Answering with Missing Modality

KR Park, HJ Lee, JU Kim - European Conference on Computer Vision, 2025 - Springer
Abstract Recent Audio-Visual Question Answering (AVQA) methods rely on complete visual
and audio input to answer questions accurately. However, in real-world scenarios, issues …

Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge

D Kim, SJ Um, S Lee, JU Kim - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
The goal of the multi-sound source localization task is to localize sound sources from the
mixture individually. While recent multi-sound source localization methods have shown …

Enhancing Audio-Visual Question Answering with Missing Modality via Trans-Modal Associative Learning

KR Park, Y Oh, JU Kim - ICASSP 2024-2024 IEEE International …, 2024 - ieeexplore.ieee.org
We present a novel method for Audio-Visual Question Answering (AVQA) in real-world
scenarios where one modality (audio or visual) can be missing. Inspired by human cognitive …