HiCMAE: Hierarchical Contrastive Masked Autoencoder for Self-Supervised Audio-Visual Emotion Recognition

L Sun, Z Lian, B Liu, J Tao - Information Fusion, 2024 - Elsevier
Abstract Audio-Visual Emotion Recognition (AVER) has garnered increasing attention in
recent years for its critical role in creating emotion-aware intelligent machines. Previous …

[PDF][PDF] Versatile audio-visual learning for handling single and multi modalities in emotion regression and classification tasks

L Goncalves, SG Leem, WC Lin, B Sisman… - arXiv preprint arXiv …, 2023 - ecs.utdallas.edu
Most current audio-visual emotion recognition models lack the flexibility needed for
deployment in practical applications. We envision a multimodal system that works even …

Selective acoustic feature enhancement for speech emotion recognition with noisy speech

SG Leem, D Fulford, JP Onnela… - … /ACM Transactions on …, 2023 - ieeexplore.ieee.org
A speech emotion recognition (SER) system deployed on a real-world application can
encounter speech contaminated with unconstrained background noise. To deal with this …

Versatile audio-visual learning for emotion recognition

L Goncalves, SG Leem, WC Lin… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Most current audio-visual emotion recognition models lack the flexibility needed for
deployment in practical applications. We envision a multimodal system that works even …

[HTML][HTML] Deep temporal clustering features for speech emotion recognition

WC Lin, C Busso - Speech Communication, 2024 - Elsevier
Deep clustering is a popular unsupervised technique for feature representation learning. We
recently proposed the chunk-based DeepEmoCluster framework for speech emotion …

Enhancing Resilience to Missing Data in Audio-Text Emotion Recognition with Multi-Scale Chunk Regularization

WC Lin, L Goncalves, C Busso - … of the 25th International Conference on …, 2023 - dl.acm.org
Most existing audio-text emotion recognition studies have focused on the computational
modeling aspects, including strategies for fusing the modalities. An area that has received …

Detail-Enhanced Intra-and Inter-modal Interaction for Audio-Visual Emotion Recognition

T Shi, X Ge, JM Jose, N Pugeault… - … Conference on Pattern …, 2025 - Springer
Capturing complex temporal relationships between video and audio modalities is vital for
Audio-Visual Emotion Recognition (AVER). However, existing methods lack attention to …