Spatio-temporal representation learning enhanced speech emotion recognition with multi-head attention mechanisms
Speech emotion recognition (SER) systems have become essential in various fields,
including intelligent healthcare, customer service, call centers, automatic translation …
including intelligent healthcare, customer service, call centers, automatic translation …
Decoupled and Explainable Associative Memory for Effective Knowledge Propagation
Long-term memory often plays a pivotal role in human cognition through the analysis of
contextual information. Machine learning researchers have attempted to emulate this …
contextual information. Machine learning researchers have attempted to emulate this …
Emotional cues extraction and fusion for multi-modal emotion prediction and recognition in conversation
H Shi, Z Liang, J Yu - arXiv preprint arXiv:2408.04547, 2024 - arxiv.org
Emotion Prediction in Conversation (EPC) aims to forecast the emotions of forthcoming
utterances by utilizing preceding dialogues. Previous EPC approaches relied on simple …
utterances by utilizing preceding dialogues. Previous EPC approaches relied on simple …
[PDF][PDF] Enhancing Multimodal Emotion Recognition through ASR Error Compensation and LLM Fine-Tuning
Multimodal emotion recognition (MER), particularly using speech and text, is promising for
enhancing human-computer interaction. However, the efficacy of such systems is often …
enhancing human-computer interaction. However, the efficacy of such systems is often …
Speech Emotion Recognition with ASR Transcripts: A Comprehensive Study on Word Error Rate and Fusion Techniques
Text data is commonly utilized as a primary input to enhance Speech Emotion Recognition
(SER) performance and reliability. However, the reliance on human-transcribed text in most …
(SER) performance and reliability. However, the reliance on human-transcribed text in most …
[PDF][PDF] Cross-modal Features Interaction-and-Aggregation Network with Self-consistency Training for Speech Emotion Recognition
In recent years, much research has been into speech emotion recognition (SER) using
multimodal data. Selective fusion of the features from different modalities is critical for …
multimodal data. Selective fusion of the features from different modalities is critical for …
Large Language Model-Based Emotional Speech Annotation Using Context and Acoustic Feature for Speech Emotion Recognition
J Santoso, K Ishizuka… - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
The remarkable emergence of large language models (LLM) and their vast capabilities have
opened a possibility for applications in various fields, including speech emotion recognition …
opened a possibility for applications in various fields, including speech emotion recognition …
MMRBN: Rule-Based Network for Multimodal Emotion Recognition
X Chen - ICASSP 2024-2024 IEEE International Conference on …, 2024 - ieeexplore.ieee.org
Human emotion is usually expressed in multiple modalities, like audio and text. Multimodal
methods can boost Emotion Recognition. However, the relationship between audio and text …
methods can boost Emotion Recognition. However, the relationship between audio and text …
[PDF][PDF] Can Machine Learning Models Recognise Emotions, Particularly Neutral, Better Than Humans?
J Siby, ELC Law - 2024.conversations.ws
Audio and visual data play vital roles in emotion recognition, with machine learning (ML)
methods like SVMs and deep neural networks excelling in inferring human emotions. This …
methods like SVMs and deep neural networks excelling in inferring human emotions. This …
[PDF][PDF] Speech emotion recognition based on crossmodal transformer and attention weight correction
R Terui, T Yamada - apsipa2024.org
In recent years, speech emotion recognition (SER) methods that use both acoustic features
and text features derived through automatic speech recognition (ASR) have become …
and text features derived through automatic speech recognition (ASR) have become …