ESPnet-TTS: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit
T Hayashi, R Yamamoto, K Inoue… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
This paper introduces a new end-to-end text-to-speech (E2E-TTS) toolkit named ESPnet-
TTS, which is an extension of the open-source speech processing toolkit ESPnet. The toolkit …
TTS, which is an extension of the open-source speech processing toolkit ESPnet. The toolkit …
ASVspoof: the automatic speaker verification spoofing and countermeasures challenge
Concerns regarding the vulnerability of automatic speaker verification (ASV) technology
against spoofing can undermine confidence in its reliability and form a barrier to exploitation …
against spoofing can undermine confidence in its reliability and form a barrier to exploitation …
Multimodal age and gender estimation for adaptive human-robot interaction: A systematic literature review
Identifying the gender of a person and his age by way of speaking is considered a crucial
task in computer vision. It is a very important and active research topic with many areas of …
task in computer vision. It is a very important and active research topic with many areas of …
Speaker identification and verification by combining MFCC and phase information
S Nakagawa, L Wang, S Ohtsuka - IEEE transactions on audio …, 2011 - ieeexplore.ieee.org
In conventional speaker recognition methods based on Mel-frequency cepstral coefficients
(MFCCs), phase information has hitherto been ignored. In this paper, we propose a phase …
(MFCCs), phase information has hitherto been ignored. In this paper, we propose a phase …
JSUT and JVS: Free Japanese voice corpora for accelerating speech synthesis research
In this paper, we develop two corpora for speech synthesis research. Thanks to
improvements in machine learning techniques, including deep learning, speech synthesis is …
improvements in machine learning techniques, including deep learning, speech synthesis is …
Domain adaptation of dnn acoustic models using knowledge distillation
T Asami, R Masumura, Y Yamaguchi… - … , Speech and Signal …, 2017 - ieeexplore.ieee.org
Constructing deep neural network (DNN) acoustic models from limited training data is an
important issue for the development of automatic speech recognition (ASR) applications that …
important issue for the development of automatic speech recognition (ASR) applications that …
[PDF][PDF] Reverberant speech recognition based on denoising autoencoder.
T Ishii, H Komiyama, T Shinozaki, Y Horiuchi… - Interspeech, 2013 - isca-archive.org
Denoising autoencoder is applied to reverberant speech recognition as a noise robust front-
end to reconstruct clean speech spectrum from noisy input. In order to capture context effects …
end to reconstruct clean speech spectrum from noisy input. In order to capture context effects …
Voice liveness detection algorithms based on pop noise caused by human breath for automatic speaker verification
This paper proposes a novel countermeasure framework to detect spoofing attacks to
reduce the vulnerability of automatic speaker verification (ASV) systems. Recently, ASV …
reduce the vulnerability of automatic speaker verification (ASV) systems. Recently, ASV …
Differences between acoustic characteristics of spontaneous and read speech and their effects on speech recognition performance
Although speech derived from read texts, news broadcasts, and other similar prepared
contexts can be recognized with high accuracy, recognition performance drastically …
contexts can be recognized with high accuracy, recognition performance drastically …
Acoustic-to-word attention-based model complemented with character-level CTC-based model
This paper addresses end-to-end speech recognition which directly maps acoustic features
to a word sequence. The acoustic-to-word model is attractive since it does not require an …
to a word sequence. The acoustic-to-word model is attractive since it does not require an …