Sepfusion: Finding optimal fusion structures for visual sound separation

G Li, Y Wei, Y Tian, C Xu, JR Wen… - Proceedings of the …, 2022 - openaccess.thecvf.com

In this paper, we focus on the Audio-Visual Question Answering (AVQA) task, which aims to
answer questions regarding different visual objects, sounds, and their associations in …

被引用次数：97 相关文章所有 8 个版本

[PDF] thecvf.com

iquery: Instruments as queries for audio-visual sound separation

J Chen, R Zhang, D Lian, J Yang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Current audio-visual separation methods share a standard architecture design where an
audio encoder-decoder network is fused with visual encoding features at the encoder …

被引用次数：16 相关文章所有 5 个版本

[PDF] arxiv.org

Progressive spatio-temporal perception for audio-visual question answering

G Li, W Hou, D Hu - Proceedings of the 31st ACM International …, 2023 - dl.acm.org

Audio-Visual Question Answering (AVQA) task aims to answer questions about different
visual objects, sounds, and their associations in videos. Such naturally multi-modal videos …

被引用次数：17 相关文章所有 3 个版本

[PDF] thecvf.com

Lavss: Location-guided audio-visual spatial audio separation

Y Ye, W Yang, Y Tian - Proceedings of the IEEE/CVF Winter …, 2024 - openaccess.thecvf.com

Existing machine learning research has achieved promising results in monaural audio-
visual separation (MAVS). However, most MAVS methods purely consider what the sound …

被引用次数：5 相关文章所有 5 个版本

Subnetwork-To-Go: Elastic Neural Network with Dynamic Training and Customizable Inference

K Li, Y Luo - ICASSP 2024-2024 IEEE International Conference …, 2024 - ieeexplore.ieee.org

Deploying neural networks to different devices or platforms is in general challenging,
especially when the model size is large or model complexity is high. Although there exist …

被引用次数：3 相关文章所有 3 个版本

[PDF] aaai.org

DC-NAS: Divide-and-Conquer Neural Architecture Search for Multi-Modal Classification

X Liang, P Fu, Q Guo, K Zheng, Y Qian - Proceedings of the AAAI …, 2024 - ojs.aaai.org

Neural architecture search-based multi-modal classification (NAS-MMC) methods can
individually obtain the optimal classifier for different multi-modal data sets in an automatic …

被引用次数：7 相关文章

[PDF] arxiv.org

Separate in the Speech Chain: Cross-Modal Conditional Audio-Visual Target Speech Extraction

Z Mu, X Yang - arXiv preprint arXiv:2404.12725, 2024 - arxiv.org

The integration of visual cues has revitalized the performance of the target speech extraction
task, elevating it to the forefront of the field. Nevertheless, this multi-modal learning paradigm …

被引用次数：2 相关文章所有 2 个版本

[PDF] aaai.org

Independency Adversarial Learning for Cross-Modal Sound Separation

Z Lin, Y Ji, Y Yang - Proceedings of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org

The sound mixture separation is still challenging due to heavy sound overlapping and
disturbance from noise. Unsupervised separation would significantly increase the difficulty …

[PDF] arxiv.org

Boosting Audio Visual Question Answering via Key Semantic-Aware Cues

G Li, H Du, D Hu - arXiv preprint arXiv:2407.20693, 2024 - arxiv.org

The Audio Visual Question Answering (AVQA) task aims to answer questions related to
various visual objects, sounds, and their interactions in videos. Such naturally multimodal …

Perceptual synchronization scoring of dubbed content using phoneme-viseme agreement

H Gupta - Proceedings of the IEEE/CVF Winter Conference …, 2024 - openaccess.thecvf.com

Recent works have shown great success in synchronizing lip-movements in a given video
with a dubbed audio stream. However, comparison and efficacy of the synchronization …

被引用次数：1 相关文章所有 5 个版本