查看文章

thecvf.com 中的 [PDF]

Continuous emotion recognition with audio-visual leader-follower attentive fusion

作者

Su Zhang, Yi Ding, Ziquan Wei, Cuntai Guan

发表日期

2021

研讨会论文

Proceedings of the IEEE/CVF international conference on computer vision

页码范围

3567-3574

简介

We propose an audio-visual spatial-temporal deep neural network with:(1) a visual block containing a pretrained 2D-CNN followed by a temporal convolutional network (TCN);(2) an aural block containing several parallel TCNs; and (3) a leader-follower attentive fusion block combining the audio-visual information. The TCN with large history coverage enables our model to exploit spatial-temporal information within a much larger window length (ie, 300) than that from the baseline and state-of-the-art methods (ie, 36 or 48). The fusion block emphasizes the visual modality while exploits the noisy aural modality using the inter-modality attention mechanism. To make full use of the data and alleviate over-fitting, the cross-validation is carried out on the training and validation set. The concordance correlation coefficient (CCC) centering is used to merge the results from each fold. On the test (development) set of the Aff-Wild2 database, the achieved CCC is 0.463 (0.469) for valence and 0.492 (0.649) for arousal, which significantly outperforms the baseline method with the corresponding CCC of 0.200 (0.210) and 0.190 (0.230) for valence and arousal, respectively. The code will be published upon the acceptance of the paper.

引用总数

被引用次数：36

20212022202320241 11 9 14

学术搜索中的文章

Continuous emotion recognition with audio-visual leader-follower attentive fusion

S Zhang, Y Ding, Z Wei, C Guan - Proceedings of the IEEE/CVF international conference …, 2021

被引用次数：29 相关文章所有 7 个版本

Audiovisual attentive fusion for continuous emotion recognition*

S Zhang, Y Ding, Z Wei, C Guan - arXiv preprint arXiv:2107.01175, 2021

被引用次数：7 相关文章