A review on subjective and objective evaluation of synthetic speech

E Cooper, WC Huang, Y Tsao, HM Wang… - Acoustical Science …, 2024 - jstage.jst.go.jp
Evaluating synthetic speech generated by machines is a complicated process, as it involves
judging along multiple dimensions including naturalness, intelligibility, and whether the …

LE-SSL-MOS: Self-Supervised Learning MOS Prediction with Listener Enhancement

Z Qi, X Hu, W Zhou, S Li, H Wu, J Lu… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Recently, researchers have shown an increasing interest in automatically predicting the
subjective evaluation for speech synthesis systems. This prediction is a challenging task …

Self-Supervised Speech Quality Estimation and Enhancement Using Only Clean Speech

SW Fu, KH Hung, Y Tsao, YCF Wang - arXiv preprint arXiv:2402.16321, 2024 - arxiv.org
Speech quality estimation has recently undergone a paradigm shift from human-hearing
expert designs to machine-learning models. However, current models rely mainly on …

RAMP: Retrieval-Augmented MOS Prediction via Confidence-based Dynamic Weighting

H Wang, S Zhao, X Zheng, Y Qin - arXiv preprint arXiv:2308.16488, 2023 - arxiv.org
Automatic Mean Opinion Score (MOS) prediction is crucial to evaluate the perceptual quality
of the synthetic speech. While recent approaches using pre-trained self-supervised learning …

MSQAT: A multi-dimension non-intrusive speech quality assessment transformer utilizing self-supervised representations

K Shen, D Yan, L Dong - Applied Acoustics, 2023 - Elsevier
Convolutional neural networks (CNNs) have been widely utilized as the main building block
for many non-intrusive speech quality assessment (NISQA) methods. A new trend is to add a …

Partial Rank Similarity Minimization Method for Quality MOS Prediction of Unseen Speech Synthesis Systems in Zero-Shot and Semi-Supervised Setting

H Yadav, E Cooper, J Yamagishi… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
This paper introduces a novel objective function for quality mean opinion score (MOS)
prediction of unseen speech synthesis systems. The proposed function measures the …

SQAT-LD: SPeech Quality Assessment Transformer Utilizing Listener Dependent Modeling for Zero-Shot Out-of-Domain MOS Prediction

K Shen, D Yan, L Dong, Y Ren, X Wu… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
In this paper, we propose the speech quality assessment transformer utilizing listener
dependent modeling (SQAT-LD) mean opinion score (MOS) prediction system, which was …

MooseNet: A Trainable Metric for Synthesized Speech with a PLDA Module

O Plátek, O Dušek - arXiv preprint arXiv:2301.07087, 2023 - arxiv.org
We present MooseNet, a trainable speech metric that predicts the listeners' Mean Opinion
Score (MOS). We propose a novel approach where the Probabilistic Linear Discriminative …

SingMOS: An extensive Open-Source Singing Voice Dataset for MOS Prediction

Y Tang, J Shi, Y Wu, Q Jin - arXiv preprint arXiv:2406.10911, 2024 - arxiv.org
In speech generation tasks, human subjective ratings, usually referred to as the opinion
score, are considered the" gold standard" for speech quality evaluation, with the mean …

Evaluation of Speech Representations for MOS prediction

F S. Oliveira, E Casanova, AC Junior, L RS Gris… - … Conference on Text …, 2023 - Springer
In this paper, we evaluate feature extraction models for predicting speech quality. We also
propose a model architecture to compare embeddings of supervised learning and self …