[HTML][HTML] Survey on bimodal speech emotion recognition from acoustic and linguistic information fusion

BT Atmaja, A Sasou, M Akagi - Speech Communication, 2022 - Elsevier
Speech emotion recognition (SER) is traditionally performed using merely acoustic
information. Acoustic features, commonly are extracted per frame, are mapped into emotion …

Dawn of the transformer era in speech emotion recognition: closing the valence gap

J Wagner, A Triantafyllopoulos… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Recent advances in transformer-based architectures have shown promise in several
machine learning tasks. In the audio domain, such architectures have been successfully …

Contextual and cross-modal interaction for multi-modal speech emotion recognition

D Yang, S Huang, Y Liu, L Zhang - IEEE Signal Processing …, 2022 - ieeexplore.ieee.org
Speech emotion recognition combining linguistic content and audio signals in the dialog is a
challenging task. Nevertheless, previous approaches have failed to explore emotion cues in …

Fusing asr outputs in joint training for speech emotion recognition

Y Li, P Bell, C Lai - ICASSP 2022-2022 IEEE International …, 2022 - ieeexplore.ieee.org
Alongside acoustic information, linguistic features based on speech transcripts have been
proven useful in Speech Emotion Recognition (SER). However, due to the scarcity of …

Self-supervised contrastive cross-modality representation learning for spoken question answering

C You, N Chen, Y Zou - arXiv preprint arXiv:2109.03381, 2021 - arxiv.org
Spoken question answering (SQA) requires fine-grained understanding of both spoken
documents and questions for the optimal answer prediction. In this paper, we propose novel …

Multimodal emotion recognition with temporal and semantic consistency

B Chen, Q Cao, M Hou, Z Zhang, G Lu… - … /ACM Transactions on …, 2021 - ieeexplore.ieee.org
Automated multimodal emotion recognition has become an emerging but challenging
research topic in the fields of affective learning and sentiment analysis. The existing works …

A fine-grained modal label-based multi-stage network for multimodal sentiment analysis

J Peng, T Wu, W Zhang, F Cheng, S Tan, F Yi… - Expert Systems with …, 2023 - Elsevier
Sentiment analysis is a challenging but valuable research topic in affective computing. It can
improve the quality of various real-world applications, including financial market prediction …

Cross-corpus speech emotion recognition based on few-shot learning and domain adaptation

Y Ahn, SJ Lee, JW Shin - IEEE Signal Processing Letters, 2021 - ieeexplore.ieee.org
Within a single speech emotion corpus, deep neural networks have shown decent
performance in speech emotion recognition. However, the performance of the emotion …

Fusion approaches for emotion recognition from speech using acoustic and text-based features

L Pepino, P Riera, L Ferrer… - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org
In this paper, we study different approaches for classifying emotions from speech using
acoustic and text-based features. We propose to obtain contextualized word embeddings …

Speaker-invariant affective representation learning via adversarial training

H Li, M Tu, J Huang, S Narayanan… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
Representation learning for speech emotion recognition is challenging due to labeled data
sparsity issue and lack of gold-standard references. In addition, there is much variability from …