Spatio-temporal representation learning enhanced speech emotion recognition with multi-head attention mechanisms

Z Chen, M Lin, Z Wang, Q Zheng, C Liu - Knowledge-Based Systems, 2023 - Elsevier
Speech emotion recognition (SER) systems have become essential in various fields,
including intelligent healthcare, customer service, call centers, automatic translation …

Decoupled and Explainable Associative Memory for Effective Knowledge Propagation

T Fernando, D Priyasad, S Sridharan… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Long-term memory often plays a pivotal role in human cognition through the analysis of
contextual information. Machine learning researchers have attempted to emulate this …

Emotional cues extraction and fusion for multi-modal emotion prediction and recognition in conversation

H Shi, Z Liang, J Yu - arXiv preprint arXiv:2408.04547, 2024 - arxiv.org
Emotion Prediction in Conversation (EPC) aims to forecast the emotions of forthcoming
utterances by utilizing preceding dialogues. Previous EPC approaches relied on simple …

[PDF][PDF] Enhancing Multimodal Emotion Recognition through ASR Error Compensation and LLM Fine-Tuning

J Kyung, S Heo, JH Chang - Proc. Interspeech 2024, 2024 - isca-archive.org
Multimodal emotion recognition (MER), particularly using speech and text, is promising for
enhancing human-computer interaction. However, the efficacy of such systems is often …

Speech Emotion Recognition with ASR Transcripts: A Comprehensive Study on Word Error Rate and Fusion Techniques

Y Li, P Bell, C Lai - arXiv preprint arXiv:2406.08353, 2024 - arxiv.org
Text data is commonly utilized as a primary input to enhance Speech Emotion Recognition
(SER) performance and reliability. However, the reliance on human-transcribed text in most …

[PDF][PDF] Cross-modal Features Interaction-and-Aggregation Network with Self-consistency Training for Speech Emotion Recognition

Y Hu, H Yang, H Huang, L He - Proc. Interspeech 2024, 2024 - isca-archive.org
In recent years, much research has been into speech emotion recognition (SER) using
multimodal data. Selective fusion of the features from different modalities is critical for …

Large Language Model-Based Emotional Speech Annotation Using Context and Acoustic Feature for Speech Emotion Recognition

J Santoso, K Ishizuka… - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
The remarkable emergence of large language models (LLM) and their vast capabilities have
opened a possibility for applications in various fields, including speech emotion recognition …

MMRBN: Rule-Based Network for Multimodal Emotion Recognition

X Chen - ICASSP 2024-2024 IEEE International Conference on …, 2024 - ieeexplore.ieee.org
Human emotion is usually expressed in multiple modalities, like audio and text. Multimodal
methods can boost Emotion Recognition. However, the relationship between audio and text …

[PDF][PDF] Can Machine Learning Models Recognise Emotions, Particularly Neutral, Better Than Humans?

J Siby, ELC Law - 2024.conversations.ws
Audio and visual data play vital roles in emotion recognition, with machine learning (ML)
methods like SVMs and deep neural networks excelling in inferring human emotions. This …

[PDF][PDF] Speech emotion recognition based on crossmodal transformer and attention weight correction

R Terui, T Yamada - apsipa2024.org
In recent years, speech emotion recognition (SER) methods that use both acoustic features
and text features derived through automatic speech recognition (ASR) have become …