作者
Milos Cernak, Blaise Potard, Philip N Garner
发表日期
2015/4/19
研讨会论文
2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
页码范围
4844-4848
出版商
IEEE
简介
We investigate a vocoder based on artificial neural networks using a phonological speech representation. Speech decomposition is based on the phonological encoders, realised as neural network classifiers, that are trained for a particular language. The speech reconstruction process involves using a Deep Neural Network (DNN) to map phonological features posteriors to speech parameters - line spectra and glottal signal parameters - followed by LPC resynthesis. This DNN is trained on a target voice without transcriptions, in a semi-supervised manner. Both encoder and decoder are based on neural networks and thus the vocoding is achieved using a simple fast forward pass. An experiment with French vocoding and a target male voice trained on 21 hour long audio book is presented. An application of the phonological vocoder to low bit rate speech coding is shown, where transmitted phonological posteriors …
引用总数
201520162017201820192020202120221738251
学术搜索中的文章
M Cernak, B Potard, PN Garner - 2015 IEEE International Conference on Acoustics …, 2015