Learning in audio-visual context: A review, analysis, and new perspective
Sight and hearing are two senses that play a vital role in human communication and scene
understanding. To mimic human perception ability, audio-visual learning, aimed at …
understanding. To mimic human perception ability, audio-visual learning, aimed at …
Multimodal learning with transformers: A survey
Transformer is a promising neural network learner, and has achieved great success in
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …
Binauralgrad: A two-stage conditional diffusion probabilistic model for binaural audio synthesis
Binaural audio plays a significant role in constructing immersive augmented and virtual
realities. As it is expensive to record binaural audio from the real world, synthesizing them …
realities. As it is expensive to record binaural audio from the real world, synthesizing them …
[图书][B] Foundation models for natural language processing: Pre-trained language models integrating media
G Paaß, S Giesselbach - 2023 - library.oapen.org
This open access book provides a comprehensive overview of the state of the art in research
and applications of Foundation Models and is intended for readers familiar with basic …
and applications of Foundation Models and is intended for readers familiar with basic …
Lavss: Location-guided audio-visual spatial audio separation
Existing machine learning research has achieved promising results in monaural audio-
visual separation (MAVS). However, most MAVS methods purely consider what the sound …
visual separation (MAVS). However, most MAVS methods purely consider what the sound …
Cyclic Learning for Binaural Audio Generation and Localization
Z Li, B Zhao, Y Yuan - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Binaural audio is obtained by simulating the biological structure of human ears which plays
an important role in artificial immersive spaces. A promising approach is to utilize mono …
an important role in artificial immersive spaces. A promising approach is to utilize mono …
[HTML][HTML] Ticino: A multi-modal remote sensing dataset for semantic segmentation
Multi-modal remote sensing (RS) involves the fusion of data from multiple sensors, such as
RGB, Multispectral, Hyperspectral, Light Detection and Ranging, Synthetic Aperture Radar …
RGB, Multispectral, Hyperspectral, Light Detection and Ranging, Synthetic Aperture Radar …
Multi-space channel representation learning for mono-to-binaural conversion based audio deepfake detection
R Liu, J Zhang, G Gao - Information Fusion, 2024 - Elsevier
Audio deepfake detection (ADD) aims to detect the fake audio generated by text-to-speech
(TTS), and voice conversion (VC), etc., which is an emerging topic. Traditionally we read the …
(TTS), and voice conversion (VC), etc., which is an emerging topic. Traditionally we read the …
Modality-independent teachers meet weakly-supervised audio-visual event parser
Audio-visual learning has been a major pillar of multi-modal machine learning, where the
community mostly focused on its $\textit {modality-aligned} $ setting, $\textit {ie} $, the audio …
community mostly focused on its $\textit {modality-aligned} $ setting, $\textit {ie} $, the audio …
Visual-guided scene-aware audio generation method based on hierarchical feature codec and rendering decision
R Wang, H Cheng, L Ye, Q Zhang - Displays, 2024 - Elsevier
Visually guided spatial sound generation (VGSSG) is a well-suited multimodal learning
method for dealing with recorded videos. However, existing methods are difficult to be …
method for dealing with recorded videos. However, existing methods are difficult to be …