VoiceFixer: Toward general speech restoration with neural vocoder

H Liu, Q Kong, Q Tian, Y Zhao, DL Wang… - arXiv preprint arXiv …, 2021 - arxiv.org
Speech restoration aims to remove distortions in speech signals. Prior methods mainly focus
on single-task speech restoration (SSR), such as speech denoising or speech declipping …

Neural vocoder is all you need for speech super-resolution

H Liu, W Choi, X Liu, Q Kong, Q Tian… - arXiv preprint arXiv …, 2022 - arxiv.org
Speech super-resolution (SR) is a task to increase speech sampling rate by generating high-
frequency components. Existing speech SR methods are trained in constrained …

Real-time speech frequency bandwidth extension

Y Li, M Tagliasacchi, O Rybakov… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
In this paper we propose a lightweight model for frequency bandwidth extension of speech
signals, increasing the sampling frequency from 8kHz to 16kHz while restoring the high …

Towards robust speech super-resolution

H Wang, DL Wang - IEEE/ACM transactions on audio, speech …, 2021 - ieeexplore.ieee.org
Speech super-resolution (SR) aims to increase the sampling rate of a given speech signal
by generating high-frequency components. This paper proposes a convolutional neural …

Bandwidth extension is all you need

J Su, Y Wang, A Finkelstein, Z Jin - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
Speech generation and enhancement have seen recent breakthroughs in quality thanks to
deep learning. These methods typically operate at a limited sampling rate of 16-22kHz due …

Blind audio bandwidth extension: A diffusion-based zero-shot approach

E Moliner, F Elvander, V Välimäki - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org
Audio bandwidth extension involves the realistic reconstruction of high-frequency spectra
from bandlimited observations. In cases where the lowpass degradation is unknown, such …

Catch-a-waveform: Learning to generate audio from a single short example

G Greshler, T Shaham… - Advances in Neural …, 2021 - proceedings.neurips.cc
Abstract Models for audio generation are typically trained on hours of recordings. Here, we
illustrate that capturing the essence of an audio source is typically possible from as little as a …

Behm-gan: Bandwidth extension of historical music using generative adversarial networks

E Moliner, V Välimäki - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org
Audio bandwidth extension aims to expand the spectrum of bandlimited audio signals.
Although this topic has been broadly studied during recent years, the particular problem of …

End-to-end LPCNet: A neural vocoder with fully-differentiable LPC estimation

K Subramani, JM Valin, U Isik, P Smaragdis… - arXiv preprint arXiv …, 2022 - arxiv.org
Neural vocoders have recently demonstrated high quality speech synthesis, but typically
require a high computational complexity. LPCNet was proposed as a way to reduce the …

Enabling real-time on-chip audio super resolution for bone-conduction microphones

Y Li, Y Wang, X Liu, Y Shi, S Patel, SF Shih - Sensors, 2022 - mdpi.com
Voice communication using an air-conduction microphone in noisy environments suffers
from the degradation of speech audibility. Bone-conduction microphones (BCM) are robust …