Fusion of self-supervised learned models for MOS prediction

E Cooper, WC Huang, Y Tsao, HM Wang… - Acoustical Science …, 2024 - jstage.jst.go.jp

Evaluating synthetic speech generated by machines is a complicated process, as it involves
judging along multiple dimensions including naturalness, intelligibility, and whether the …

被引用次数：17 相关文章

[PDF] arxiv.org

A study on incorporating Whisper for robust speech assessment

RE Zezario, YW Chen, SW Fu, Y Tsao… - … on Multimedia and …, 2024 - ieeexplore.ieee.org

This research introduces an enhanced version of the multi-objective speech assessment
model–MOSA-Net+, by leveraging the acoustic features from Whisper, a large-scaled …

被引用次数：13 相关文章所有 4 个版本

[PDF] arxiv.org

LE-SSL-MOS: Self-Supervised Learning MOS Prediction with Listener Enhancement

Z Qi, X Hu, W Zhou, S Li, H Wu, J Lu… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

Recently, researchers have shown an increasing interest in automatically predicting the
subjective evaluation for speech synthesis systems. This prediction is a challenging task …

被引用次数：16 相关文章所有 3 个版本

[PDF] arxiv.org

RAMP: Retrieval-Augmented MOS Prediction via Confidence-based Dynamic Weighting

H Wang, S Zhao, X Zheng, Y Qin - arXiv preprint arXiv:2308.16488, 2023 - arxiv.org

Automatic Mean Opinion Score (MOS) prediction is crucial to evaluate the perceptual quality
of the synthetic speech. While recent approaches using pre-trained self-supervised learning …

被引用次数：9 相关文章所有 5 个版本

[HTML] aip.org

[HTML][HTML] Multi-objective non-intrusive hearing-aid speech assessment model

HT Chiang, SW Fu, HM Wang, Y Tsao… - The Journal of the …, 2024 - pubs.aip.org

Because a reference signal is often unavailable in real-world scenarios, reference-free
speech quality and intelligibility assessment models are important for many speech …

被引用次数：3 相关文章所有 2 个版本

[PDF] ieee.org

Coded Speech Quality Measurement by a Non-Intrusive PESQ-DNN

Z Xu, Z Zhao, T Fingscheidt - IEEE/ACM Transactions on Audio …, 2023 - ieeexplore.ieee.org

Wideband codecs such as AMR-WB or EVS are widely used in (mobile) speech
communication. Evaluation of coded speech quality is often performed subjectively by an …

被引用次数：5 相关文章所有 4 个版本

[PDF] arxiv.org

MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction

W Zhou, Z Yang, C Chu, S Li, R Dabre… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

IEEE Automatic Mean Opinion Score (MOS) prediction is employed to evaluate the quality of
synthetic speech. This study extends the application of predicted MOS to the task of Fake …

被引用次数：1 相关文章所有 6 个版本

SQAT-LD: SPeech Quality Assessment Transformer Utilizing Listener Dependent Modeling for Zero-Shot Out-of-Domain MOS Prediction

K Shen, D Yan, L Dong, Y Ren, X Wu… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

In this paper, we propose the speech quality assessment transformer utilizing listener
dependent modeling (SQAT-LD) mean opinion score (MOS) prediction system, which was …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

Investigating content-aware neural text-to-speech mos prediction using prosodic and linguistic features

A Vioni, G Maniati, N Ellinas, JS Sung… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Current state-of-the-art methods for automatic synthetic speech evaluation are based on
MOS prediction neural models. Such MOS prediction models include MOSNet and LDNet …

被引用次数：6 相关文章所有 6 个版本

[PDF] arxiv.org

SingMOS: An extensive Open-Source Singing Voice Dataset for MOS Prediction

Y Tang, J Shi, Y Wu, Q Jin - arXiv preprint arXiv:2406.10911, 2024 - arxiv.org

In speech generation tasks, human subjective ratings, usually referred to as the opinion
score, are considered the" gold standard" for speech quality evaluation, with the mean …

被引用次数：7 相关文章