The voicemos challenge 2022

WC Huang, E Cooper, Y Tsao, HM Wang… - arXiv preprint arXiv …, 2022 - arxiv.org
We present the first edition of the VoiceMOS Challenge, a scientific event that aims to
promote the study of automatic prediction of the mean opinion score (MOS) of synthetic …

Generalization ability of MOS prediction networks

E Cooper, WC Huang, T Toda… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Automatic methods to predict listener opinions of synthesized speech remain elusive since
listeners, systems being evaluated, characteristics of the speech, and even the instructions …

Deep learning-based non-intrusive multi-objective speech assessment model with cross-domain features

RE Zezario, SW Fu, F Chen, CS Fuh… - … on Audio, Speech …, 2022 - ieeexplore.ieee.org
This study proposes a cross-domain multi-objective speech assessment model, called
MOSA-Net, which can simultaneously estimate the speech quality, intelligibility, and …

Meta-tts: Meta-learning for few-shot speaker adaptive text-to-speech

SF Huang, CJ Lin, DR Liu, YC Chen… - IEEE/ACM Transactions …, 2022 - ieeexplore.ieee.org
Personalizing a speech synthesis system is a highly desired application, where the system
can generate speech with the user's voice with rare enrolled recordings. There are two main …

Ldnet: Unified listener dependent modeling in mos prediction for synthetic speech

WC Huang, E Cooper, J Yamagishi… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
An effective approach to automatically predict the subjective rating for synthetic speech is to
train on a listening test dataset with human-annotated scores. Although each speech sample …

A review on subjective and objective evaluation of synthetic speech

E Cooper, WC Huang, Y Tsao, HM Wang… - Acoustical Science …, 2024 - jstage.jst.go.jp
Evaluating synthetic speech generated by machines is a complicated process, as it involves
judging along multiple dimensions including naturalness, intelligibility, and whether the …

SQuId: Measuring speech naturalness in many languages

T Sellam, A Bapna, J Camp… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Much of text-to-speech research relies on human evaluation. This incurs heavy costs and
slows down the development process, especially in heavily multilingual applications where …

Exploring the influence of fine-tuning data on wav2vec 2.0 model for blind speech quality prediction

H Becerra, A Ragano, A Hines - arXiv preprint arXiv:2204.02135, 2022 - arxiv.org
Recent studies have shown how self-supervised models can produce accurate speech
quality predictions. Speech representations generated by the pre-trained wav2vec 2.0 …

Improving meeting inclusiveness using speech interruption analysis

SW Fu, Y Fan, Y Hosseinkashi, J Gupchup… - Proceedings of the 30th …, 2022 - dl.acm.org
Meetings are a pervasive method of communication within all types of companies and
organizations, and using remote collaboration systems to conduct meetings has increased …

DDOS: A MOS prediction framework utilizing domain adaptive pre-training and distribution of opinion scores

WC Tseng, WT Kao, H Lee - arXiv preprint arXiv:2204.03219, 2022 - arxiv.org
Mean opinion score (MOS) is a typical subjective evaluation metric for speech synthesis
systems. Since collecting MOS is time-consuming, it would be desirable if there are accurate …