作者
Zengwei Yao, Zihao Wang, Weihuang Liu, Yaqian Liu, Jiahui Pan
发表日期
2020/6/1
期刊
Speech Communication
卷号
120
页码范围
11-19
出版商
North-Holland
简介
Speech emotion recognition plays an increasingly important role in emotional computing and is still a challenging task due to its complexity. In this study, we developed a framework integrating three distinctive classifiers: a deep neural network (DNN), a convolution neural network (CNN), and a recurrent neural network (RNN). The framework was used for categorical recognition of four discrete emotions (i.e., angry, happy, neutral and sad). Frame-level low-level descriptors (LLDs), segment-level mel-spectrograms (MS), and utterance-level outputs of high-level statistical functions (HSFs) on LLDs were passed to RNN, CNN, and DNN, separately. Three individual models of LLD-RNN, MS-CNN, and HSF-DNN were obtained. In the models of MS-CNN and LLD-RNN, the attention mechanism based weighted-pooling method was utilized to aggregate the CNN and RNN outputs. To effectively utilize the …
引用总数
20202021202220232024632335413