An evaluation of synthetic speech using the PESQ measure

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org

Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while keeping the linguistic …

被引用次数：323 相关文章所有 8 个版本

[PDF] arxiv.org

Mosnet: Deep learning based objective assessment for voice conversion

CC Lo, SW Fu, WC Huang, X Wang… - arXiv preprint arXiv …, 2019 - arxiv.org

Existing objective evaluation metrics for voice conversion (VC) are not always correlated
with human perception. Therefore, training VC models with such criteria may not effectively …

被引用次数：260 相关文章所有 14 个版本

[PDF] jst.go.jp

A review on subjective and objective evaluation of synthetic speech

E Cooper, WC Huang, Y Tsao, HM Wang… - Acoustical Science …, 2024 - jstage.jst.go.jp

Evaluating synthetic speech generated by machines is a complicated process, as it involves
judging along multiple dimensions including naturalness, intelligibility, and whether the …

被引用次数：4 相关文章

[PDF] ed.ac.uk

The limits of the Mean Opinion Score for speech synthesis evaluation

S Le Maguer, S King, N Harte - Computer Speech & Language, 2024 - Elsevier

The release of WaveNet and Tacotron has forever transformed the speech synthesis
landscape. Thanks to these game-changing innovations, the quality of synthetic speech has …

被引用次数：4 相关文章所有 4 个版本

[PDF] uoc.gr

On the use of WaveNet as a statistical vocoder

N Adiga, V Tsiaras, Y Stylianou - 2018 IEEE International …, 2018 - ieeexplore.ieee.org

In this paper, we explore the possibility of using the WaveNet architecture as a statistical
vocoder. In that case, the generation of speech waveforms is locally conditioned only by …

被引用次数：44 相关文章所有 5 个版本

[PDF] arxiv.org

Audio similarity is unreliable as a proxy for audio quality

P Manocha, Z Jin, A Finkelstein - arXiv preprint arXiv:2206.13411, 2022 - arxiv.org

Many audio processing tasks require perceptual assessment. However, the time and
expense of obtaining``gold standard''human judgments limit the availability of such data …

被引用次数：8 相关文章所有 7 个版本

[PDF] isca-archive.org

[PDF][PDF] Speech Enhancement for Noise-Robust Speech Synthesis Using Wasserstein GAN.

N Adiga, Y Pantazis, V Tsiaras, Y Stylianou - INTERSPEECH, 2019 - isca-archive.org

The quality of speech synthesis systems can be significantly deteriorated by the presence of
background noise in the recordings. Despite the existence of speech enhancement …

被引用次数：25 相关文章所有 5 个版本

[PDF] ed.ac.uk

A hierarchical predictor of synthetic speech naturalness using neural networks

T Yoshimura, GE Henter, O Watts, M Wester… - Interspeech …, 2016 - research.ed.ac.uk

A problem when developing and tuning speech synthesis systems is that there is no well-
established method of automatically rating the quality of the synthetic speech. This research …

被引用次数：30 相关文章所有 8 个版本

[PDF] ed.ac.uk

Listeners' weighting of acoustic cues to synthetic speech naturalness: A multidimensional scaling analysis

C Mayo, RAJ Clark, S King - Speech Communication, 2011 - Elsevier

The quality of current commercial speech synthesis systems is now so high that system
improvements are being made at subtle sub-and supra-segmental levels. Human perceptual …

被引用次数：49 相关文章所有 6 个版本

[PDF] isca-archive.org

[PDF][PDF] Non-Intrusive Speech Quality Assessment with Transfer Learning and Subject-Specific Scaling.

N Nessler, M Cernak, P Prandoni, P Mainar - Interspeech, 2021 - isca-archive.org

In communication systems, it is crucial to estimate the perceived quality of audio and
speech. The industrial standards for many years have been PESQ, 3QUEST, and POLQA …

被引用次数：9 相关文章所有 5 个版本