Sparks of large audio models: A survey and outlook
This survey paper provides a comprehensive overview of the recent advancements and
challenges in applying large language models to the field of audio signal processing. Audio …
challenges in applying large language models to the field of audio signal processing. Audio …
High fidelity neural audio compression
We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural
networks. It consists in a streaming encoder-decoder architecture with quantized latent …
networks. It consists in a streaming encoder-decoder architecture with quantized latent …
High-fidelity audio compression with improved rvqgan
Abstract Language models have been successfully used to model natural signals, such as
images, speech, and music. A key component of these models is a high quality neural …
images, speech, and music. A key component of these models is a high quality neural …
Speechx: Neural codec language model as a versatile speech transformer
Recent advancements in generative speech models based on audio-text prompts have
enabled remarkable innovations like high-quality zero-shot text-to-speech. However …
enabled remarkable innovations like high-quality zero-shot text-to-speech. However …
CMGAN: Conformer-based metric GAN for speech enhancement
R Cao, S Abdulatif, B Yang - arXiv preprint arXiv:2203.15149, 2022 - arxiv.org
Recently, convolution-augmented transformer (Conformer) has achieved promising
performance in automatic speech recognition (ASR) and time-domain speech enhancement …
performance in automatic speech recognition (ASR) and time-domain speech enhancement …
CASE-Net: Integrating local and non-local attention operations for speech enhancement
Local and non-local attention operations are two ubiquitous operations in the domain of
speech enhancement (SE), and they are effective to generate more discriminative patterns …
speech enhancement (SE), and they are effective to generate more discriminative patterns …
Icassp 2023 acoustic echo cancellation challenge
The ICASSP 2023 Acoustic Echo Cancellation Challenge is intended to stimulate research
in acoustic echo cancellation (AEC), which is an important area of speech enhancement and …
in acoustic echo cancellation (AEC), which is an important area of speech enhancement and …
From discrete tokens to high-fidelity audio using multi-band diffusion
Deep generative models can generate high-fidelity audio conditioned on varioustypes of
representations (eg, mel-spectrograms, Mel-frequency Cepstral Coefficients (MFCC)) …
representations (eg, mel-spectrograms, Mel-frequency Cepstral Coefficients (MFCC)) …
The VoicePrivacy 2024 Challenge Evaluation Plan
The task of the challenge is to develop a voice anonymization system for speech data which
conceals the speaker's voice identity while protecting linguistic content and emotional states …
conceals the speaker's voice identity while protecting linguistic content and emotional states …
FRCRN: Boosting feature representation using frequency recurrence for monaural speech enhancement
S Zhao, B Ma, KN Watcharasupat… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Convolutional recurrent networks (CRN) integrating a convolutional encoder-decoder (CED)
structure and a recurrent structure have achieved promising performance for monaural …
structure and a recurrent structure have achieved promising performance for monaural …