JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research

T Hayashi, R Yamamoto, K Inoue… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

This paper introduces a new end-to-end text-to-speech (E2E-TTS) toolkit named ESPnet-
TTS, which is an extension of the open-source speech processing toolkit ESPnet. The toolkit …

被引用次数：245 相关文章所有 7 个版本

[PDF] ed.ac.uk

ASVspoof: the automatic speaker verification spoofing and countermeasures challenge

Z Wu, J Yamagishi, T Kinnunen… - IEEE Journal of …, 2017 - ieeexplore.ieee.org

Concerns regarding the vulnerability of automatic speaker verification (ASV) technology
against spoofing can undermine confidence in its reliability and form a barrier to exploitation …

被引用次数：837 相关文章所有 30 个版本

[PDF] mdpi.com

Multimodal age and gender estimation for adaptive human-robot interaction: A systematic literature review

HA Younis, NIR Ruhaiyem, AA Badr, AK Abdul-Hassan… - Processes, 2023 - mdpi.com

Identifying the gender of a person and his age by way of speaking is considered a crucial
task in computer vision. It is a very important and active research topic with many areas of …

被引用次数：10 相关文章所有 8 个版本

[PDF] researchgate.net

Speaker identification and verification by combining MFCC and phase information

S Nakagawa, L Wang, S Ohtsuka - IEEE transactions on audio …, 2011 - ieeexplore.ieee.org

In conventional speaker recognition methods based on Mel-frequency cepstral coefficients
(MFCCs), phase information has hitherto been ignored. In this paper, we propose a phase …

被引用次数：285 相关文章所有 9 个版本

[PDF] jst.go.jp

JSUT and JVS: Free Japanese voice corpora for accelerating speech synthesis research

S Takamichi, R Sonobe, K Mitsui, Y Saito… - Acoustical Science …, 2020 - jstage.jst.go.jp

In this paper, we develop two corpora for speech synthesis research. Thanks to
improvements in machine learning techniques, including deep learning, speech synthesis is …

被引用次数：62 相关文章所有 5 个版本

Domain adaptation of dnn acoustic models using knowledge distillation

T Asami, R Masumura, Y Yamaguchi… - … , Speech and Signal …, 2017 - ieeexplore.ieee.org

Constructing deep neural network (DNN) acoustic models from limited training data is an
important issue for the development of automatic speech recognition (ASR) applications that …

被引用次数：104 相关文章所有 3 个版本

[PDF] isca-archive.org

[PDF][PDF] Reverberant speech recognition based on denoising autoencoder.

T Ishii, H Komiyama, T Shinozaki, Y Horiuchi… - Interspeech, 2013 - isca-archive.org

Denoising autoencoder is applied to reverberant speech recognition as a noise robust front-
end to reconstruct clean speech spectrum from noisy input. In order to capture context effects …

被引用次数：137 相关文章所有 5 个版本

[PDF] ed.ac.uk

Voice liveness detection algorithms based on pop noise caused by human breath for automatic speaker verification

S Shiota, F Villavicencio, J Yamagishi… - … 2015 16th Annual …, 2015 - research.ed.ac.uk

This paper proposes a novel countermeasure framework to detect spoofing attacks to
reduce the vulnerability of automatic speaker verification (ASV) systems. Recently, ASV …

被引用次数：107 相关文章所有 9 个版本

Differences between acoustic characteristics of spontaneous and read speech and their effects on speech recognition performance

M Nakamura, K Iwano, S Furui - Computer Speech & Language, 2008 - Elsevier

Although speech derived from read texts, news broadcasts, and other similar prepared
contexts can be recognized with high accuracy, recognition performance drastically …

被引用次数：164 相关文章所有 6 个版本

[PDF] kyoto-u.ac.jp

Acoustic-to-word attention-based model complemented with character-level CTC-based model

S Ueno, H Inaguma, M Mimura… - 2018 IEEE International …, 2018 - ieeexplore.ieee.org

This paper addresses end-to-end speech recognition which directly maps acoustic features
to a word sequence. The acoustic-to-word model is attractive since it does not require an …

被引用次数：77 相关文章所有 7 个版本