Multi-Modal Learning for Speech Emotion Recognition: An Analysis and Comparison of ASR Outputs...

[HTML][HTML] Survey on bimodal speech emotion recognition from acoustic and linguistic information fusion

BT Atmaja, A Sasou, M Akagi - Speech Communication, 2022 - Elsevier

Speech emotion recognition (SER) is traditionally performed using merely acoustic
information. Acoustic features, commonly are extracted per frame, are mapped into emotion …

被引用次数：69 相关文章所有 7 个版本

[PDF] arxiv.org

Dawn of the transformer era in speech emotion recognition: closing the valence gap

J Wagner, A Triantafyllopoulos… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

Recent advances in transformer-based architectures have shown promise in several
machine learning tasks. In the audio domain, such architectures have been successfully …

被引用次数：210 相关文章所有 8 个版本

[PDF] github.io

Contextual and cross-modal interaction for multi-modal speech emotion recognition

D Yang, S Huang, Y Liu, L Zhang - IEEE Signal Processing …, 2022 - ieeexplore.ieee.org

Speech emotion recognition combining linguistic content and audio signals in the dialog is a
challenging task. Nevertheless, previous approaches have failed to explore emotion cues in …

被引用次数：39 相关文章所有 3 个版本

[PDF] arxiv.org

Fusing asr outputs in joint training for speech emotion recognition

Y Li, P Bell, C Lai - ICASSP 2022-2022 IEEE International …, 2022 - ieeexplore.ieee.org

Alongside acoustic information, linguistic features based on speech transcripts have been
proven useful in Speech Emotion Recognition (SER). However, due to the scarcity of …

被引用次数：52 相关文章所有 6 个版本

[PDF] arxiv.org

Self-supervised contrastive cross-modality representation learning for spoken question answering

C You, N Chen, Y Zou - arXiv preprint arXiv:2109.03381, 2021 - arxiv.org

Spoken question answering (SQA) requires fine-grained understanding of both spoken
documents and questions for the optimal answer prediction. In this paper, we propose novel …

被引用次数：52 相关文章所有 6 个版本

Multimodal emotion recognition with temporal and semantic consistency

B Chen, Q Cao, M Hou, Z Zhang, G Lu… - … /ACM Transactions on …, 2021 - ieeexplore.ieee.org

Automated multimodal emotion recognition has become an emerging but challenging
research topic in the fields of affective learning and sentiment analysis. The existing works …

被引用次数：35 相关文章所有 3 个版本

A fine-grained modal label-based multi-stage network for multimodal sentiment analysis

J Peng, T Wu, W Zhang, F Cheng, S Tan, F Yi… - Expert Systems with …, 2023 - Elsevier

Sentiment analysis is a challenging but valuable research topic in affective computing. It can
improve the quality of various real-world applications, including financial market prediction …

被引用次数：13 相关文章所有 2 个版本

Cross-corpus speech emotion recognition based on few-shot learning and domain adaptation

Y Ahn, SJ Lee, JW Shin - IEEE Signal Processing Letters, 2021 - ieeexplore.ieee.org

Within a single speech emotion corpus, deep neural networks have shown decent
performance in speech emotion recognition. However, the performance of the emotion …

被引用次数：41 相关文章所有 2 个版本

[PDF] arxiv.org

Fusion approaches for emotion recognition from speech using acoustic and text-based features

L Pepino, P Riera, L Ferrer… - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org

In this paper, we study different approaches for classifying emotions from speech using
acoustic and text-based features. We propose to obtain contextualized word embeddings …

被引用次数：58 相关文章所有 4 个版本

[PDF] arxiv.org

Speaker-invariant affective representation learning via adversarial training

H Li, M Tu, J Huang, S Narayanan… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

Representation learning for speech emotion recognition is challenging due to labeled data
sparsity issue and lack of gold-standard references. In addition, there is much variability from …

被引用次数：60 相关文章所有 6 个版本