作者
Chin-Hui Lee, Chih-Heng Lin, Biing-Hwang Juang
发表日期
1991/4/1
期刊
IEEE Transactions on Signal Processing
卷号
39
期号
4
页码范围
806-814
简介
It is generally agreed that, for a given speech recognition task, a speaker-dependent system usually outperforms a speaker-independent system, as long as a sufficient amount of training data is available. When the amount of speaker-specific training data is limited, however, such a performance gain is not guaranteed. One way to improve the performance is to make use of existing knowledge, contained in a rich speaker-independent (or multispeaker) data base, so that a minimum amount of training data is sufficient to model the new speaker. Such a training procedure is often referred to as speaker adaptation when a priori knowledge is derived from a speaker-independent (or multispeaker) data base; and as speaker conversion when the knowledge is derived from a different speaker. We mainly address the speaker adaptation issue here. For a speech recognition system based on continuous density hidden Markov models (CDHMM), speaker adaptation of the parameters of CDHMM is formulated as a Bayesian learning procedure. In this study we present a speaker adaptation procedure which is easily integrated into the segmental k-means training procedure for obtaining adaptive estimates of the CDHMM parameters. We report on some results for adapting both the mean and the diagonal covariance matrix of the Gaussian state observation densities of a CDHMM. When testing on a 39-word English alpha-digit vocabulary in isolated word mode, the results indicate that the speaker adaptation procedure achieves the same level of performance of a speakerindependent system, when one training token from each word is used to perform …
引用总数
199019911992199319941995199619971998199920002001200220032004200520062007200820092010201120122013201420152016201720182019202020212252182325371423241922112725231215211515181212577331
学术搜索中的文章