An overview of voice conversion and its challenges: From statistical modeling to deep learning
Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while keeping the linguistic …
conversion, we change the speaker identity from one to another, while keeping the linguistic …
Mosnet: Deep learning based objective assessment for voice conversion
Existing objective evaluation metrics for voice conversion (VC) are not always correlated
with human perception. Therefore, training VC models with such criteria may not effectively …
with human perception. Therefore, training VC models with such criteria may not effectively …
A review on subjective and objective evaluation of synthetic speech
Evaluating synthetic speech generated by machines is a complicated process, as it involves
judging along multiple dimensions including naturalness, intelligibility, and whether the …
judging along multiple dimensions including naturalness, intelligibility, and whether the …
The limits of the Mean Opinion Score for speech synthesis evaluation
The release of WaveNet and Tacotron has forever transformed the speech synthesis
landscape. Thanks to these game-changing innovations, the quality of synthetic speech has …
landscape. Thanks to these game-changing innovations, the quality of synthetic speech has …
On the use of WaveNet as a statistical vocoder
N Adiga, V Tsiaras, Y Stylianou - 2018 IEEE International …, 2018 - ieeexplore.ieee.org
In this paper, we explore the possibility of using the WaveNet architecture as a statistical
vocoder. In that case, the generation of speech waveforms is locally conditioned only by …
vocoder. In that case, the generation of speech waveforms is locally conditioned only by …
Audio similarity is unreliable as a proxy for audio quality
Many audio processing tasks require perceptual assessment. However, the time and
expense of obtaining``gold standard''human judgments limit the availability of such data …
expense of obtaining``gold standard''human judgments limit the availability of such data …
[PDF][PDF] Speech Enhancement for Noise-Robust Speech Synthesis Using Wasserstein GAN.
The quality of speech synthesis systems can be significantly deteriorated by the presence of
background noise in the recordings. Despite the existence of speech enhancement …
background noise in the recordings. Despite the existence of speech enhancement …
A hierarchical predictor of synthetic speech naturalness using neural networks
A problem when developing and tuning speech synthesis systems is that there is no well-
established method of automatically rating the quality of the synthetic speech. This research …
established method of automatically rating the quality of the synthetic speech. This research …
Listeners' weighting of acoustic cues to synthetic speech naturalness: A multidimensional scaling analysis
The quality of current commercial speech synthesis systems is now so high that system
improvements are being made at subtle sub-and supra-segmental levels. Human perceptual …
improvements are being made at subtle sub-and supra-segmental levels. Human perceptual …
[PDF][PDF] Non-Intrusive Speech Quality Assessment with Transfer Learning and Subject-Specific Scaling.
In communication systems, it is crucial to estimate the perceived quality of audio and
speech. The industrial standards for many years have been PESQ, 3QUEST, and POLQA …
speech. The industrial standards for many years have been PESQ, 3QUEST, and POLQA …