VoiceFixer: Toward general speech restoration with neural vocoder
Speech restoration aims to remove distortions in speech signals. Prior methods mainly focus
on single-task speech restoration (SSR), such as speech denoising or speech declipping …
on single-task speech restoration (SSR), such as speech denoising or speech declipping …
Neural vocoder is all you need for speech super-resolution
Speech super-resolution (SR) is a task to increase speech sampling rate by generating high-
frequency components. Existing speech SR methods are trained in constrained …
frequency components. Existing speech SR methods are trained in constrained …
Real-time speech frequency bandwidth extension
In this paper we propose a lightweight model for frequency bandwidth extension of speech
signals, increasing the sampling frequency from 8kHz to 16kHz while restoring the high …
signals, increasing the sampling frequency from 8kHz to 16kHz while restoring the high …
Towards robust speech super-resolution
Speech super-resolution (SR) aims to increase the sampling rate of a given speech signal
by generating high-frequency components. This paper proposes a convolutional neural …
by generating high-frequency components. This paper proposes a convolutional neural …
Bandwidth extension is all you need
Speech generation and enhancement have seen recent breakthroughs in quality thanks to
deep learning. These methods typically operate at a limited sampling rate of 16-22kHz due …
deep learning. These methods typically operate at a limited sampling rate of 16-22kHz due …
Blind audio bandwidth extension: A diffusion-based zero-shot approach
Audio bandwidth extension involves the realistic reconstruction of high-frequency spectra
from bandlimited observations. In cases where the lowpass degradation is unknown, such …
from bandlimited observations. In cases where the lowpass degradation is unknown, such …
Catch-a-waveform: Learning to generate audio from a single short example
G Greshler, T Shaham… - Advances in Neural …, 2021 - proceedings.neurips.cc
Abstract Models for audio generation are typically trained on hours of recordings. Here, we
illustrate that capturing the essence of an audio source is typically possible from as little as a …
illustrate that capturing the essence of an audio source is typically possible from as little as a …
Behm-gan: Bandwidth extension of historical music using generative adversarial networks
E Moliner, V Välimäki - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org
Audio bandwidth extension aims to expand the spectrum of bandlimited audio signals.
Although this topic has been broadly studied during recent years, the particular problem of …
Although this topic has been broadly studied during recent years, the particular problem of …
End-to-end LPCNet: A neural vocoder with fully-differentiable LPC estimation
Neural vocoders have recently demonstrated high quality speech synthesis, but typically
require a high computational complexity. LPCNet was proposed as a way to reduce the …
require a high computational complexity. LPCNet was proposed as a way to reduce the …
Enabling real-time on-chip audio super resolution for bone-conduction microphones
Voice communication using an air-conduction microphone in noisy environments suffers
from the degradation of speech audibility. Bone-conduction microphones (BCM) are robust …
from the degradation of speech audibility. Bone-conduction microphones (BCM) are robust …