Utilizing self-supervised representations for MOS prediction

WC Huang, E Cooper, Y Tsao, HM Wang… - arXiv preprint arXiv …, 2022 - arxiv.org

We present the first edition of the VoiceMOS Challenge, a scientific event that aims to
promote the study of automatic prediction of the mean opinion score (MOS) of synthetic …

被引用次数：92 相关文章所有 9 个版本

Generalization ability of MOS prediction networks

E Cooper, WC Huang, T Toda… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Automatic methods to predict listener opinions of synthesized speech remain elusive since
listeners, systems being evaluated, characteristics of the speech, and even the instructions …

被引用次数：120 相关文章所有 5 个版本

[PDF] ieee.org

Deep learning-based non-intrusive multi-objective speech assessment model with cross-domain features

RE Zezario, SW Fu, F Chen, CS Fuh… - … on Audio, Speech …, 2022 - ieeexplore.ieee.org

This study proposes a cross-domain multi-objective speech assessment model, called
MOSA-Net, which can simultaneously estimate the speech quality, intelligibility, and …

被引用次数：62 相关文章所有 7 个版本

[PDF] arxiv.org

Meta-tts: Meta-learning for few-shot speaker adaptive text-to-speech

SF Huang, CJ Lin, DR Liu, YC Chen… - IEEE/ACM Transactions …, 2022 - ieeexplore.ieee.org

Personalizing a speech synthesis system is a highly desired application, where the system
can generate speech with the user's voice with rare enrolled recordings. There are two main …

被引用次数：52 相关文章所有 5 个版本

[PDF] arxiv.org

Ldnet: Unified listener dependent modeling in mos prediction for synthetic speech

WC Huang, E Cooper, J Yamagishi… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

An effective approach to automatically predict the subjective rating for synthetic speech is to
train on a listening test dataset with human-annotated scores. Although each speech sample …

被引用次数：52 相关文章所有 4 个版本

[PDF] jst.go.jp

A review on subjective and objective evaluation of synthetic speech

E Cooper, WC Huang, Y Tsao, HM Wang… - Acoustical Science …, 2024 - jstage.jst.go.jp

Evaluating synthetic speech generated by machines is a complicated process, as it involves
judging along multiple dimensions including naturalness, intelligibility, and whether the …

被引用次数：4 相关文章

[PDF] arxiv.org

SQuId: Measuring speech naturalness in many languages

T Sellam, A Bapna, J Camp… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Much of text-to-speech research relies on human evaluation. This incurs heavy costs and
slows down the development process, especially in heavily multilingual applications where …

被引用次数：11 相关文章所有 4 个版本

[PDF] arxiv.org

Exploring the influence of fine-tuning data on wav2vec 2.0 model for blind speech quality prediction

H Becerra, A Ragano, A Hines - arXiv preprint arXiv:2204.02135, 2022 - arxiv.org

Recent studies have shown how self-supervised models can produce accurate speech
quality predictions. Speech representations generated by the pre-trained wav2vec 2.0 …

被引用次数：17 相关文章所有 7 个版本

[PDF] arxiv.org

Improving meeting inclusiveness using speech interruption analysis

SW Fu, Y Fan, Y Hosseinkashi, J Gupchup… - Proceedings of the 30th …, 2022 - dl.acm.org

Meetings are a pervasive method of communication within all types of companies and
organizations, and using remote collaboration systems to conduct meetings has increased …

被引用次数：9 相关文章所有 4 个版本

[PDF] arxiv.org

DDOS: A MOS prediction framework utilizing domain adaptive pre-training and distribution of opinion scores

WC Tseng, WT Kao, H Lee - arXiv preprint arXiv:2204.03219, 2022 - arxiv.org

Mean opinion score (MOS) is a typical subjective evaluation metric for speech synthesis
systems. Since collecting MOS is time-consuming, it would be desirable if there are accurate …

被引用次数：14 相关文章所有 9 个版本