Beyond mono to binaural: Generating binaural audio from mono audio with depth and cross modal...

Y Wei, D Hu, Y Tian, X Li - arXiv preprint arXiv:2208.09579, 2022 - arxiv.org

Sight and hearing are two senses that play a vital role in human communication and scene
understanding. To mimic human perception ability, audio-visual learning, aimed at …

被引用次数：49 相关文章所有 2 个版本

[PDF] ieee.org

Multimodal learning with transformers: A survey

P Xu, X Zhu, DA Clifton - IEEE Transactions on Pattern Analysis …, 2023 - ieeexplore.ieee.org

Transformer is a promising neural network learner, and has achieved great success in
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …

被引用次数：388 相关文章所有 9 个版本

[PDF] neurips.cc

Binauralgrad: A two-stage conditional diffusion probabilistic model for binaural audio synthesis

Y Leng, Z Chen, J Guo, H Liu, J Chen… - Advances in …, 2022 - proceedings.neurips.cc

Binaural audio plays a significant role in constructing immersive augmented and virtual
realities. As it is expensive to record binaural audio from the real world, synthesizing them …

被引用次数：49 相关文章所有 6 个版本

[PDF] oapen.org

[图书][B] Foundation models for natural language processing: Pre-trained language models integrating media

G Paaß, S Giesselbach - 2023 - library.oapen.org

This open access book provides a comprehensive overview of the state of the art in research
and applications of Foundation Models and is intended for readers familiar with basic …

被引用次数：27 相关文章所有 10 个版本

[PDF] thecvf.com

Lavss: Location-guided audio-visual spatial audio separation

Y Ye, W Yang, Y Tian - Proceedings of the IEEE/CVF Winter …, 2024 - openaccess.thecvf.com

Existing machine learning research has achieved promising results in monaural audio-
visual separation (MAVS). However, most MAVS methods purely consider what the sound …

被引用次数：5 相关文章所有 5 个版本

[PDF] thecvf.com

Cyclic Learning for Binaural Audio Generation and Localization

Z Li, B Zhao, Y Yuan - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Binaural audio is obtained by simulating the biological structure of human ears which plays
an important role in artificial immersive spaces. A promising approach is to utilize mono …

[HTML] sciencedirect.com

[HTML][HTML] Ticino: A multi-modal remote sensing dataset for semantic segmentation

MP Barbato, F Piccoli, P Napoletano - Expert Systems with Applications, 2024 - Elsevier

Multi-modal remote sensing (RS) involves the fusion of data from multiple sensors, such as
RGB, Multispectral, Hyperspectral, Light Detection and Ranging, Synthetic Aperture Radar …

被引用次数：2 相关文章所有 2 个版本

Multi-space channel representation learning for mono-to-binaural conversion based audio deepfake detection

R Liu, J Zhang, G Gao - Information Fusion, 2024 - Elsevier

Audio deepfake detection (ADD) aims to detect the fake audio generated by text-to-speech
(TTS), and voice conversion (VC), etc., which is an emerging topic. Traditionally we read the …

被引用次数：2 相关文章所有 3 个版本

[PDF] neurips.cc

Modality-independent teachers meet weakly-supervised audio-visual event parser

YH Lai, YC Chen, F Wang - Advances in Neural Information …, 2023 - proceedings.neurips.cc

Audio-visual learning has been a major pillar of multi-modal machine learning, where the
community mostly focused on its $\textit {modality-aligned} $ setting, $\textit {ie} $, the audio …

被引用次数：2 相关文章所有 6 个版本

Visual-guided scene-aware audio generation method based on hierarchical feature codec and rendering decision

R Wang, H Cheng, L Ye, Q Zhang - Displays, 2024 - Elsevier

Visually guided spatial sound generation (VGSSG) is a well-suited multimodal learning
method for dealing with recorded videos. However, existing methods are difficult to be …

被引用次数：1 相关文章