Sparks of large audio models: A survey and outlook

S Latif, M Shoukat, F Shamshad, M Usama… - arXiv preprint arXiv …, 2023 - arxiv.org
This survey paper provides a comprehensive overview of the recent advancements and
challenges in applying large language models to the field of audio signal processing. Audio …

High fidelity neural audio compression

A Défossez, J Copet, G Synnaeve, Y Adi - arXiv preprint arXiv:2210.13438, 2022 - arxiv.org
We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural
networks. It consists in a streaming encoder-decoder architecture with quantized latent …

High-fidelity audio compression with improved rvqgan

R Kumar, P Seetharaman, A Luebs… - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract Language models have been successfully used to model natural signals, such as
images, speech, and music. A key component of these models is a high quality neural …

Speechx: Neural codec language model as a versatile speech transformer

X Wang, M Thakker, Z Chen, N Kanda… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org
Recent advancements in generative speech models based on audio-text prompts have
enabled remarkable innovations like high-quality zero-shot text-to-speech. However …

CMGAN: Conformer-based metric GAN for speech enhancement

R Cao, S Abdulatif, B Yang - arXiv preprint arXiv:2203.15149, 2022 - arxiv.org
Recently, convolution-augmented transformer (Conformer) has achieved promising
performance in automatic speech recognition (ASR) and time-domain speech enhancement …

CASE-Net: Integrating local and non-local attention operations for speech enhancement

X Xu, W Tu, Y Yang - Speech Communication, 2023 - Elsevier
Local and non-local attention operations are two ubiquitous operations in the domain of
speech enhancement (SE), and they are effective to generate more discriminative patterns …

Icassp 2023 acoustic echo cancellation challenge

R Cutler, A Saabas, T Pärnamaa… - IEEE Open Journal …, 2024 - ieeexplore.ieee.org
The ICASSP 2023 Acoustic Echo Cancellation Challenge is intended to stimulate research
in acoustic echo cancellation (AEC), which is an important area of speech enhancement and …

From discrete tokens to high-fidelity audio using multi-band diffusion

R San Roman, Y Adi, A Deleforge… - Advances in neural …, 2023 - proceedings.neurips.cc
Deep generative models can generate high-fidelity audio conditioned on varioustypes of
representations (eg, mel-spectrograms, Mel-frequency Cepstral Coefficients (MFCC)) …

The VoicePrivacy 2024 Challenge Evaluation Plan

N Tomashenko, X Miao, P Champion, S Meyer… - arXiv preprint arXiv …, 2024 - arxiv.org
The task of the challenge is to develop a voice anonymization system for speech data which
conceals the speaker's voice identity while protecting linguistic content and emotional states …

FRCRN: Boosting feature representation using frequency recurrence for monaural speech enhancement

S Zhao, B Ma, KN Watcharasupat… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Convolutional recurrent networks (CRN) integrating a convolutional encoder-decoder (CED)
structure and a recurrent structure have achieved promising performance for monaural …