Speech emotion recognition with co-attention based multi-level acoustic information

H Zou, Y Si, C Chen, D Rajan… - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Speech Emotion Recognition (SER) aims to help the machine to understand human's
subjective emotion from only audio in-formation. However, extracting and utilizing …

Deep Multimodal Data Fusion

F Zhao, C Zhang, B Geng - ACM Computing Surveys, 2024 - dl.acm.org
Multimodal Artificial Intelligence (Multimodal AI), in general, involves various types of data
(eg, images, texts, or data collected from different sensors), feature engineering (eg …

Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis

L Sun, Z Lian, B Liu, J Tao - IEEE Transactions on Affective …, 2023 - ieeexplore.ieee.org
With the proliferation of user-generated online videos, Multimodal Sentiment Analysis (MSA)
has attracted increasing attention recently. Despite significant progress, there are still two …

Contextual and cross-modal interaction for multi-modal speech emotion recognition

D Yang, S Huang, Y Liu, L Zhang - IEEE Signal Processing …, 2022 - ieeexplore.ieee.org
Speech emotion recognition combining linguistic content and audio signals in the dialog is a
challenging task. Nevertheless, previous approaches have failed to explore emotion cues in …

Multimodal fusion on low-quality data: A comprehensive survey

Q Zhang, Y Wei, Z Han, H Fu, X Peng, C Deng… - arXiv preprint arXiv …, 2024 - arxiv.org
Multimodal fusion focuses on integrating information from multiple modalities with the goal of
more accurate prediction, which has achieved remarkable progress in a wide range of …

Multimodal emotion recognition using cross modal audio-video fusion with attention and deep metric learning

B Mocanu, R Tapu, T Zaharia - Image and Vision Computing, 2023 - Elsevier
In the last few years, the multi-modal emotion recognition has become an important research
issue in the affective computing community due to its wide range of applications that include …

Attention-based multi-learning approach for speech emotion recognition with dilated convolution

S Kakuba, A Poulose, DS Han - IEEE Access, 2022 - ieeexplore.ieee.org
The success of deep learning in speech emotion recognition has led to its application in
resource-constrained devices. It has been applied in human-to-machine interaction …

Is cross-attention preferable to self-attention for multi-modal emotion recognition?

V Rajan, A Brutti, A Cavallaro - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Humans express their emotions via facial expressions, voice intonation and word choices.
To infer the nature of the underlying emotion, recognition models may use a single modality …

AMPS: Predicting popularity of short-form videos using multi-modal attention mechanisms in social media marketing environments

M Cho, D Jeong, E Park - Journal of Retailing and Consumer Services, 2024 - Elsevier
Emerging as a dominant content format amid the shift from television to mobile, short-form
videos wield immense potential across diverse domains. However, the scarcity of datasets …

A cross-modal fusion network based on self-attention and residual structure for multimodal emotion recognition

Z Fu, F Liu, H Wang, J Qi, X Fu, A Zhou, Z Li - arXiv preprint arXiv …, 2021 - arxiv.org
The audio-video based multimodal emotion recognition has attracted a lot of attention due to
its robust performance. Most of the existing methods focus on proposing different cross …