ESPnet-TTS: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit

T Hayashi, R Yamamoto, K Inoue… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
This paper introduces a new end-to-end text-to-speech (E2E-TTS) toolkit named ESPnet-
TTS, which is an extension of the open-source speech processing toolkit ESPnet. The toolkit …

ASVspoof: the automatic speaker verification spoofing and countermeasures challenge

Z Wu, J Yamagishi, T Kinnunen… - IEEE Journal of …, 2017 - ieeexplore.ieee.org
Concerns regarding the vulnerability of automatic speaker verification (ASV) technology
against spoofing can undermine confidence in its reliability and form a barrier to exploitation …

Multimodal age and gender estimation for adaptive human-robot interaction: A systematic literature review

HA Younis, NIR Ruhaiyem, AA Badr, AK Abdul-Hassan… - Processes, 2023 - mdpi.com
Identifying the gender of a person and his age by way of speaking is considered a crucial
task in computer vision. It is a very important and active research topic with many areas of …

Speaker identification and verification by combining MFCC and phase information

S Nakagawa, L Wang, S Ohtsuka - IEEE transactions on audio …, 2011 - ieeexplore.ieee.org
In conventional speaker recognition methods based on Mel-frequency cepstral coefficients
(MFCCs), phase information has hitherto been ignored. In this paper, we propose a phase …

JSUT and JVS: Free Japanese voice corpora for accelerating speech synthesis research

S Takamichi, R Sonobe, K Mitsui, Y Saito… - Acoustical Science …, 2020 - jstage.jst.go.jp
In this paper, we develop two corpora for speech synthesis research. Thanks to
improvements in machine learning techniques, including deep learning, speech synthesis is …

Domain adaptation of dnn acoustic models using knowledge distillation

T Asami, R Masumura, Y Yamaguchi… - … , Speech and Signal …, 2017 - ieeexplore.ieee.org
Constructing deep neural network (DNN) acoustic models from limited training data is an
important issue for the development of automatic speech recognition (ASR) applications that …

[PDF][PDF] Reverberant speech recognition based on denoising autoencoder.

T Ishii, H Komiyama, T Shinozaki, Y Horiuchi… - Interspeech, 2013 - isca-archive.org
Denoising autoencoder is applied to reverberant speech recognition as a noise robust front-
end to reconstruct clean speech spectrum from noisy input. In order to capture context effects …

Voice liveness detection algorithms based on pop noise caused by human breath for automatic speaker verification

S Shiota, F Villavicencio, J Yamagishi… - … 2015 16th Annual …, 2015 - research.ed.ac.uk
This paper proposes a novel countermeasure framework to detect spoofing attacks to
reduce the vulnerability of automatic speaker verification (ASV) systems. Recently, ASV …

Differences between acoustic characteristics of spontaneous and read speech and their effects on speech recognition performance

M Nakamura, K Iwano, S Furui - Computer Speech & Language, 2008 - Elsevier
Although speech derived from read texts, news broadcasts, and other similar prepared
contexts can be recognized with high accuracy, recognition performance drastically …

Acoustic-to-word attention-based model complemented with character-level CTC-based model

S Ueno, H Inaguma, M Mimura… - 2018 IEEE International …, 2018 - ieeexplore.ieee.org
This paper addresses end-to-end speech recognition which directly maps acoustic features
to a word sequence. The acoustic-to-word model is attractive since it does not require an …