Implementation of realtime STRAIGHT speech manipulation system: Report on its first implementation

S Latif, J Qadir, A Qayyum, M Usama… - IEEE Reviews in …, 2020 - ieeexplore.ieee.org

Speech technology is not appropriately explored even though modern advances in speech
technology—especially those driven by deep learning (DL) technology—offer …

被引用次数：130 相关文章所有 3 个版本

[PDF] mdpi.com

A review of deep learning based speech synthesis

Y Ning, S He, Z Wu, C Xing, LJ Zhang - Applied Sciences, 2019 - mdpi.com

Speech synthesis, also known as text-to-speech (TTS), has attracted increasingly more
attention. Recent advances on speech synthesis are overwhelmingly contributed by deep …

被引用次数：188 相关文章所有 6 个版本

[PDF] jst.go.jp

World: a vocoder-based high-quality speech synthesis system for real-time applications

M Morise, F Yokomori, K Ozawa - IEICE TRANSACTIONS on …, 2016 - search.ieice.org

A vocoder-based speech synthesis system, named WORLD, was developed in an effort to
improve the sound quality of real-time applications using speech. Speech analysis …

被引用次数：1447 相关文章所有 11 个版本

[PDF] worldscientific.com

A review on human-computer interaction and intelligent robots

F Ren, Y Bao - International Journal of Information Technology & …, 2020 - World Scientific

In the field of artificial intelligence, human–computer interaction (HCI) technology and its
related intelligent robot technologies are essential and interesting contents of research …

被引用次数：134 相关文章所有 10 个版本

[HTML] sciencedirect.com

[HTML][HTML] D4C, a band-aperiodicity estimator for high-quality speech synthesis

M Morise - Speech Communication, 2016 - Elsevier

An algorithm is proposed for estimating the band aperiodicity of speech signals, where
“aperiodicity” is defined as the power ratio between the speech signal and the aperiodic …

被引用次数：224 相关文章所有 6 个版本

[PDF] isca-archive.org

[PDF][PDF] Harvest: A High-Performance Fundamental Frequency Estimator from Speech Signals.

M Morise - INTERSPEECH, 2017 - isca-archive.org

A fundamental frequency (F0) estimator named Harvest is described. The unique points of
Harvest are that it can obtain a reliable F0 contour and reduce the error that the voiced …

被引用次数：109 相关文章所有 4 个版本

[PDF] isca-archive.org

[PDF][PDF] Convolutional Neural Network Based Speaker De-Identification.

F Bahmaninezhad, C Zhang, JHL Hansen - Odyssey, 2018 - isca-archive.org

Concealing speaker identity in speech signals refers to the task of speaker de-identification,
which helps protect the privacy of a speaker. Although, both linguistic and paralinguistic …

被引用次数：52 相关文章所有 6 个版本

[PDF] psu.edu

Adjusting dysarthric speech signals to be more intelligible

F Rudzicz - Computer Speech & Language, 2013 - Elsevier

This paper presents a system that transforms the speech signals of speakers with physical
speech disabilities into a more intelligible form that can be more easily understood by …

被引用次数：85 相关文章所有 14 个版本

Recognition of facial expressions and prosodic cues with graded emotional intensities in adults with Asperger syndrome

H Doi, TX Fujisawa, C Kanai, H Ohta, H Yokoi… - Journal of autism and …, 2013 - Springer

This study investigated the ability of adults with Asperger syndrome to recognize emotional
categories of facial expressions and emotional prosodies with graded emotional intensities …

被引用次数：65 相关文章所有 13 个版本

[PDF] arxiv.org

Towards robust neural vocoding for speech generation: A survey

P Hsu, C Wang, AT Liu, H Lee - arXiv preprint arXiv:1912.02461, 2019 - arxiv.org

Recently, neural vocoders have been widely used in speech synthesis tasks, including text-
to-speech and voice conversion. However, when encountering data distribution mismatch …

被引用次数：29 相关文章所有 3 个版本