Speech technology for healthcare: Opportunities, challenges, and state of the art

S Latif, J Qadir, A Qayyum, M Usama… - IEEE Reviews in …, 2020 - ieeexplore.ieee.org
Speech technology is not appropriately explored even though modern advances in speech
technology—especially those driven by deep learning (DL) technology—offer …

A review of deep learning based speech synthesis

Y Ning, S He, Z Wu, C Xing, LJ Zhang - Applied Sciences, 2019 - mdpi.com
Speech synthesis, also known as text-to-speech (TTS), has attracted increasingly more
attention. Recent advances on speech synthesis are overwhelmingly contributed by deep …

World: a vocoder-based high-quality speech synthesis system for real-time applications

M Morise, F Yokomori, K Ozawa - IEICE TRANSACTIONS on …, 2016 - search.ieice.org
A vocoder-based speech synthesis system, named WORLD, was developed in an effort to
improve the sound quality of real-time applications using speech. Speech analysis …

A review on human-computer interaction and intelligent robots

F Ren, Y Bao - International Journal of Information Technology & …, 2020 - World Scientific
In the field of artificial intelligence, human–computer interaction (HCI) technology and its
related intelligent robot technologies are essential and interesting contents of research …

[HTML][HTML] D4C, a band-aperiodicity estimator for high-quality speech synthesis

M Morise - Speech Communication, 2016 - Elsevier
An algorithm is proposed for estimating the band aperiodicity of speech signals, where
“aperiodicity” is defined as the power ratio between the speech signal and the aperiodic …

[PDF][PDF] Harvest: A High-Performance Fundamental Frequency Estimator from Speech Signals.

M Morise - INTERSPEECH, 2017 - isca-archive.org
A fundamental frequency (F0) estimator named Harvest is described. The unique points of
Harvest are that it can obtain a reliable F0 contour and reduce the error that the voiced …

[PDF][PDF] Convolutional Neural Network Based Speaker De-Identification.

F Bahmaninezhad, C Zhang, JHL Hansen - Odyssey, 2018 - isca-archive.org
Concealing speaker identity in speech signals refers to the task of speaker de-identification,
which helps protect the privacy of a speaker. Although, both linguistic and paralinguistic …

Adjusting dysarthric speech signals to be more intelligible

F Rudzicz - Computer Speech & Language, 2013 - Elsevier
This paper presents a system that transforms the speech signals of speakers with physical
speech disabilities into a more intelligible form that can be more easily understood by …

Recognition of facial expressions and prosodic cues with graded emotional intensities in adults with Asperger syndrome

H Doi, TX Fujisawa, C Kanai, H Ohta, H Yokoi… - Journal of autism and …, 2013 - Springer
This study investigated the ability of adults with Asperger syndrome to recognize emotional
categories of facial expressions and emotional prosodies with graded emotional intensities …

Towards robust neural vocoding for speech generation: A survey

P Hsu, C Wang, AT Liu, H Lee - arXiv preprint arXiv:1912.02461, 2019 - arxiv.org
Recently, neural vocoders have been widely used in speech synthesis tasks, including text-
to-speech and voice conversion. However, when encountering data distribution mismatch …