DDOS: A MOS prediction framework utilizing domain adaptive pre-training and distribution...

E Cooper, WC Huang, Y Tsao, HM Wang… - Acoustical Science …, 2024 - jstage.jst.go.jp

Evaluating synthetic speech generated by machines is a complicated process, as it involves
judging along multiple dimensions including naturalness, intelligibility, and whether the …

被引用次数：4 相关文章

[PDF] arxiv.org

LE-SSL-MOS: Self-Supervised Learning MOS Prediction with Listener Enhancement

Z Qi, X Hu, W Zhou, S Li, H Wu, J Lu… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

Recently, researchers have shown an increasing interest in automatically predicting the
subjective evaluation for speech synthesis systems. This prediction is a challenging task …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

Self-Supervised Speech Quality Estimation and Enhancement Using Only Clean Speech

SW Fu, KH Hung, Y Tsao, YCF Wang - arXiv preprint arXiv:2402.16321, 2024 - arxiv.org

Speech quality estimation has recently undergone a paradigm shift from human-hearing
expert designs to machine-learning models. However, current models rely mainly on …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

RAMP: Retrieval-Augmented MOS Prediction via Confidence-based Dynamic Weighting

H Wang, S Zhao, X Zheng, Y Qin - arXiv preprint arXiv:2308.16488, 2023 - arxiv.org

Automatic Mean Opinion Score (MOS) prediction is crucial to evaluate the perceptual quality
of the synthetic speech. While recent approaches using pre-trained self-supervised learning …

被引用次数：3 相关文章所有 5 个版本

MSQAT: A multi-dimension non-intrusive speech quality assessment transformer utilizing self-supervised representations

K Shen, D Yan, L Dong - Applied Acoustics, 2023 - Elsevier

Convolutional neural networks (CNNs) have been widely utilized as the main building block
for many non-intrusive speech quality assessment (NISQA) methods. A new trend is to add a …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Partial Rank Similarity Minimization Method for Quality MOS Prediction of Unseen Speech Synthesis Systems in Zero-Shot and Semi-Supervised Setting

H Yadav, E Cooper, J Yamagishi… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

This paper introduces a novel objective function for quality mean opinion score (MOS)
prediction of unseen speech synthesis systems. The proposed function measures the …

被引用次数：1 相关文章所有 4 个版本

SQAT-LD: SPeech Quality Assessment Transformer Utilizing Listener Dependent Modeling for Zero-Shot Out-of-Domain MOS Prediction

K Shen, D Yan, L Dong, Y Ren, X Wu… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

In this paper, we propose the speech quality assessment transformer utilizing listener
dependent modeling (SQAT-LD) mean opinion score (MOS) prediction system, which was …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

MooseNet: A Trainable Metric for Synthesized Speech with a PLDA Module

O Plátek, O Dušek - arXiv preprint arXiv:2301.07087, 2023 - arxiv.org

We present MooseNet, a trainable speech metric that predicts the listeners' Mean Opinion
Score (MOS). We propose a novel approach where the Probabilistic Linear Discriminative …

被引用次数：2 相关文章所有 6 个版本

[PDF] arxiv.org

SingMOS: An extensive Open-Source Singing Voice Dataset for MOS Prediction

Y Tang, J Shi, Y Wu, Q Jin - arXiv preprint arXiv:2406.10911, 2024 - arxiv.org

In speech generation tasks, human subjective ratings, usually referred to as the opinion
score, are considered the" gold standard" for speech quality evaluation, with the mean …

[PDF] arxiv.org

Evaluation of Speech Representations for MOS prediction

F S. Oliveira, E Casanova, AC Junior, L RS Gris… - … Conference on Text …, 2023 - Springer

In this paper, we evaluate feature extraction models for predicting speech quality. We also
propose a model architecture to compare embeddings of supervised learning and self …

被引用次数：1 相关文章所有 4 个版本