Masked Audio Modeling with CLAP and Multi-Objective Learning
Most existing masked audio modeling (MAM) methods learn audio representations by
masking and reconstructing local spectrogram patches. However, the reconstruction loss …
masking and reconstructing local spectrogram patches. However, the reconstruction loss …
[PDF][PDF] Background-aware Modeling for Weakly Supervised Sound Event Detection
Nowadays, a common framework for weakly supervised sound event detection (WSSED) is
multiple instance learning (MIL). However, MIL directly optimizes the clip-level classification …
multiple instance learning (MIL). However, MIL directly optimizes the clip-level classification …
Improving Speech Enhancement Using Audio Tagging Knowledge From Pre-Trained Representations and Multi-Task Learning
S Lin, C Zhang, Y Qian - 2023 IEEE Automatic Speech …, 2023 - ieeexplore.ieee.org
In deep-learning-based speech enhancement (SE), an audio-knowledge-ignorant approach
is often used, which estimates a denoising model to transform the noisy input speech into …
is often used, which estimates a denoising model to transform the noisy input speech into …
SLIT: Boosting Audio-Text Pre-Training via Multi-Stage Learning and Instruction Tuning
Audio-text pre-training (ATP) has witnessed remarkable strides across a variety of
downstream tasks. Yet, most existing pretrained audio models only specialize in either …
downstream tasks. Yet, most existing pretrained audio models only specialize in either …
Complete and separate: Conditional separation with missing target source attribute completion
Recent approaches in source separation leverage semantic information about their input
mixtures and constituent sources that when used in conditional separation models can …
mixtures and constituent sources that when used in conditional separation models can …