Deep learning-based multimodal emotion recognition from audio, visual, and text modalities: A systematic review of recent advancements and future prospects
S Zhang, Y Yang, C Chen, X Zhang, Q Leng… - Expert Systems with …, 2024 - Elsevier
Emotion recognition has recently attracted extensive interest due to its significant
applications to human–computer interaction. The expression of human emotion depends on …
applications to human–computer interaction. The expression of human emotion depends on …
A survey of multimodal deep generative models
Multimodal learning is a framework for building models that make predictions based on
different types of modalities. Important challenges in multimodal learning are the inference of …
different types of modalities. Important challenges in multimodal learning are the inference of …
Dawn of the transformer era in speech emotion recognition: closing the valence gap
J Wagner, A Triantafyllopoulos… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Recent advances in transformer-based architectures have shown promise in several
machine learning tasks. In the audio domain, such architectures have been successfully …
machine learning tasks. In the audio domain, such architectures have been successfully …
Decoupled multimodal distilling for emotion recognition
Human multimodal emotion recognition (MER) aims to perceive human emotions via
language, visual and acoustic modalities. Despite the impressive performance of previous …
language, visual and acoustic modalities. Despite the impressive performance of previous …
Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis
In multimodal sentiment analysis (MSA), the performance of a model highly depends on the
quality of synthesized embeddings. These embeddings are generated from the upstream …
quality of synthesized embeddings. These embeddings are generated from the upstream …
Disentangled representation learning for multimodal emotion recognition
Multimodal emotion recognition aims to identify human emotions from text, audio, and visual
modalities. Previous methods either explore correlations between different modalities or …
modalities. Previous methods either explore correlations between different modalities or …
Misa: Modality-invariant and-specific representations for multimodal sentiment analysis
Multimodal Sentiment Analysis is an active area of research that leverages multimodal
signals for affective understanding of user-generated videos. The predominant approach …
signals for affective understanding of user-generated videos. The predominant approach …
[HTML][HTML] Multimodal transformer for unaligned multimodal language sequences
Human language is often multimodal, which comprehends a mixture of natural language,
facial gestures, and acoustic behaviors. However, two major challenges in modeling such …
facial gestures, and acoustic behaviors. However, two major challenges in modeling such …
Univl: A unified video and language pre-training model for multimodal understanding and generation
With the recent success of the pre-training technique for NLP and image-linguistic tasks,
some video-linguistic pre-training works are gradually developed to improve video-text …
some video-linguistic pre-training works are gradually developed to improve video-text …
Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph
Analyzing human multimodal language is an emerging area of research in NLP. Intrinsically
this language is multimodal (heterogeneous), sequential and asynchronous; it consists of …
this language is multimodal (heterogeneous), sequential and asynchronous; it consists of …