The voicemos challenge 2022
We present the first edition of the VoiceMOS Challenge, a scientific event that aims to
promote the study of automatic prediction of the mean opinion score (MOS) of synthetic …
promote the study of automatic prediction of the mean opinion score (MOS) of synthetic …
Generalization ability of MOS prediction networks
Automatic methods to predict listener opinions of synthesized speech remain elusive since
listeners, systems being evaluated, characteristics of the speech, and even the instructions …
listeners, systems being evaluated, characteristics of the speech, and even the instructions …
Deep learning-based non-intrusive multi-objective speech assessment model with cross-domain features
This study proposes a cross-domain multi-objective speech assessment model, called
MOSA-Net, which can simultaneously estimate the speech quality, intelligibility, and …
MOSA-Net, which can simultaneously estimate the speech quality, intelligibility, and …
Meta-tts: Meta-learning for few-shot speaker adaptive text-to-speech
Personalizing a speech synthesis system is a highly desired application, where the system
can generate speech with the user's voice with rare enrolled recordings. There are two main …
can generate speech with the user's voice with rare enrolled recordings. There are two main …
Ldnet: Unified listener dependent modeling in mos prediction for synthetic speech
WC Huang, E Cooper, J Yamagishi… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
An effective approach to automatically predict the subjective rating for synthetic speech is to
train on a listening test dataset with human-annotated scores. Although each speech sample …
train on a listening test dataset with human-annotated scores. Although each speech sample …
A review on subjective and objective evaluation of synthetic speech
Evaluating synthetic speech generated by machines is a complicated process, as it involves
judging along multiple dimensions including naturalness, intelligibility, and whether the …
judging along multiple dimensions including naturalness, intelligibility, and whether the …
SQuId: Measuring speech naturalness in many languages
Much of text-to-speech research relies on human evaluation. This incurs heavy costs and
slows down the development process, especially in heavily multilingual applications where …
slows down the development process, especially in heavily multilingual applications where …
Exploring the influence of fine-tuning data on wav2vec 2.0 model for blind speech quality prediction
Recent studies have shown how self-supervised models can produce accurate speech
quality predictions. Speech representations generated by the pre-trained wav2vec 2.0 …
quality predictions. Speech representations generated by the pre-trained wav2vec 2.0 …
Improving meeting inclusiveness using speech interruption analysis
Meetings are a pervasive method of communication within all types of companies and
organizations, and using remote collaboration systems to conduct meetings has increased …
organizations, and using remote collaboration systems to conduct meetings has increased …
DDOS: A MOS prediction framework utilizing domain adaptive pre-training and distribution of opinion scores
Mean opinion score (MOS) is a typical subjective evaluation metric for speech synthesis
systems. Since collecting MOS is time-consuming, it would be desirable if there are accurate …
systems. Since collecting MOS is time-consuming, it would be desirable if there are accurate …