A review on subjective and objective evaluation of synthetic speech

E Cooper, WC Huang, Y Tsao, HM Wang… - Acoustical Science …, 2024 - jstage.jst.go.jp
Evaluating synthetic speech generated by machines is a complicated process, as it involves
judging along multiple dimensions including naturalness, intelligibility, and whether the …

A study on incorporating Whisper for robust speech assessment

RE Zezario, YW Chen, SW Fu, Y Tsao… - … on Multimedia and …, 2024 - ieeexplore.ieee.org
This research introduces an enhanced version of the multi-objective speech assessment
model–MOSA-Net+, by leveraging the acoustic features from Whisper, a large-scaled …

LE-SSL-MOS: Self-Supervised Learning MOS Prediction with Listener Enhancement

Z Qi, X Hu, W Zhou, S Li, H Wu, J Lu… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Recently, researchers have shown an increasing interest in automatically predicting the
subjective evaluation for speech synthesis systems. This prediction is a challenging task …

RAMP: Retrieval-Augmented MOS Prediction via Confidence-based Dynamic Weighting

H Wang, S Zhao, X Zheng, Y Qin - arXiv preprint arXiv:2308.16488, 2023 - arxiv.org
Automatic Mean Opinion Score (MOS) prediction is crucial to evaluate the perceptual quality
of the synthetic speech. While recent approaches using pre-trained self-supervised learning …

[HTML][HTML] Multi-objective non-intrusive hearing-aid speech assessment model

HT Chiang, SW Fu, HM Wang, Y Tsao… - The Journal of the …, 2024 - pubs.aip.org
Because a reference signal is often unavailable in real-world scenarios, reference-free
speech quality and intelligibility assessment models are important for many speech …

Coded Speech Quality Measurement by a Non-Intrusive PESQ-DNN

Z Xu, Z Zhao, T Fingscheidt - IEEE/ACM Transactions on Audio …, 2023 - ieeexplore.ieee.org
Wideband codecs such as AMR-WB or EVS are widely used in (mobile) speech
communication. Evaluation of coded speech quality is often performed subjectively by an …

MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction

W Zhou, Z Yang, C Chu, S Li, R Dabre… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
IEEE Automatic Mean Opinion Score (MOS) prediction is employed to evaluate the quality of
synthetic speech. This study extends the application of predicted MOS to the task of Fake …

SQAT-LD: SPeech Quality Assessment Transformer Utilizing Listener Dependent Modeling for Zero-Shot Out-of-Domain MOS Prediction

K Shen, D Yan, L Dong, Y Ren, X Wu… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
In this paper, we propose the speech quality assessment transformer utilizing listener
dependent modeling (SQAT-LD) mean opinion score (MOS) prediction system, which was …

Investigating content-aware neural text-to-speech mos prediction using prosodic and linguistic features

A Vioni, G Maniati, N Ellinas, JS Sung… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Current state-of-the-art methods for automatic synthetic speech evaluation are based on
MOS prediction neural models. Such MOS prediction models include MOSNet and LDNet …

SingMOS: An extensive Open-Source Singing Voice Dataset for MOS Prediction

Y Tang, J Shi, Y Wu, Q Jin - arXiv preprint arXiv:2406.10911, 2024 - arxiv.org
In speech generation tasks, human subjective ratings, usually referred to as the opinion
score, are considered the" gold standard" for speech quality evaluation, with the mean …