Icassp 2023 deep noise suppression challenge

S Latif, M Shoukat, F Shamshad, M Usama… - arXiv preprint arXiv …, 2023 - arxiv.org

This survey paper provides a comprehensive overview of the recent advancements and
challenges in applying large language models to the field of audio signal processing. Audio …

被引用次数：22 相关文章所有 4 个版本

[PDF] arxiv.org

High fidelity neural audio compression

A Défossez, J Copet, G Synnaeve, Y Adi - arXiv preprint arXiv:2210.13438, 2022 - arxiv.org

We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural
networks. It consists in a streaming encoder-decoder architecture with quantized latent …

被引用次数：561 相关文章所有 3 个版本

[PDF] neurips.cc

High-fidelity audio compression with improved rvqgan

R Kumar, P Seetharaman, A Luebs… - Advances in Neural …, 2024 - proceedings.neurips.cc

Abstract Language models have been successfully used to model natural signals, such as
images, speech, and music. A key component of these models is a high quality neural …

被引用次数：180 相关文章所有 5 个版本

[PDF] arxiv.org

Speechx: Neural codec language model as a versatile speech transformer

X Wang, M Thakker, Z Chen, N Kanda… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org

Recent advancements in generative speech models based on audio-text prompts have
enabled remarkable innovations like high-quality zero-shot text-to-speech. However …

被引用次数：54 相关文章所有 2 个版本

[PDF] arxiv.org

CMGAN: Conformer-based metric GAN for speech enhancement

R Cao, S Abdulatif, B Yang - arXiv preprint arXiv:2203.15149, 2022 - arxiv.org

Recently, convolution-augmented transformer (Conformer) has achieved promising
performance in automatic speech recognition (ASR) and time-domain speech enhancement …

被引用次数：102 相关文章所有 7 个版本

CASE-Net: Integrating local and non-local attention operations for speech enhancement

X Xu, W Tu, Y Yang - Speech Communication, 2023 - Elsevier

Local and non-local attention operations are two ubiquitous operations in the domain of
speech enhancement (SE), and they are effective to generate more discriminative patterns …

被引用次数：15 相关文章所有 2 个版本

[PDF] ieee.org Full View

Icassp 2023 acoustic echo cancellation challenge

R Cutler, A Saabas, T Pärnamaa… - IEEE Open Journal …, 2024 - ieeexplore.ieee.org

The ICASSP 2023 Acoustic Echo Cancellation Challenge is intended to stimulate research
in acoustic echo cancellation (AEC), which is an important area of speech enhancement and …

被引用次数：85 相关文章所有 10 个版本

[PDF] neurips.cc

From discrete tokens to high-fidelity audio using multi-band diffusion

R San Roman, Y Adi, A Deleforge… - Advances in neural …, 2023 - proceedings.neurips.cc

Deep generative models can generate high-fidelity audio conditioned on varioustypes of
representations (eg, mel-spectrograms, Mel-frequency Cepstral Coefficients (MFCC)) …

被引用次数：17 相关文章所有 8 个版本

[PDF] arxiv.org

The VoicePrivacy 2024 Challenge Evaluation Plan

N Tomashenko, X Miao, P Champion, S Meyer… - arXiv preprint arXiv …, 2024 - arxiv.org

The task of the challenge is to develop a voice anonymization system for speech data which
conceals the speaker's voice identity while protecting linguistic content and emotional states …

被引用次数：90 相关文章所有 23 个版本

[PDF] arxiv.org

FRCRN: Boosting feature representation using frequency recurrence for monaural speech enhancement

S Zhao, B Ma, KN Watcharasupat… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Convolutional recurrent networks (CRN) integrating a convolutional encoder-decoder (CED)
structure and a recurrent structure have achieved promising performance for monaural …

被引用次数：75 相关文章所有 3 个版本