作者
Tamás Gábor Csapó, Géza Németh, Milos Cernak, Philip N Garner
发表日期
2016/8/29
研讨会论文
2016 24th European Signal Processing Conference (EUSIPCO)
页码范围
1338-1342
出版商
IEEE
简介
In this paper, we introduce an improved excitation model for statistical parametric speech synthesis. Our earlier vocoder [1], which applies continuous F0 in combination with Maximum Voiced Frequency (MVF), is extended. The focus of this paper is on the modeling of unvoiced consonants, for which two alternative methods are proposed. The first method applies no postprocessing during MVF estimation to reduce the unwanted voiced component of unvoiced speech sounds. The second separates voiced and unvoiced excitation based on the phonetic labels of the text to be synthesized. In an objective experiment we found that the first method produces unvoiced sounds that are closer to natural speech in terms of Harmonics-to-Noise Ratio. A subjective listening test showed that both methods are more natural than our baseline system, and the second method is significantly preferred.
引用总数
20162017201820192020202120222023110224233
学术搜索中的文章
TG Csapó, G Németh, M Cernak, PN Garner - 2016 24th European Signal Processing Conference …, 2016