Learning cross-modal audiovisual representations with ladder networks for emotion recognition

HiCMAE: Hierarchical Contrastive Masked Autoencoder for Self-Supervised Audio-Visual Emotion Recognition

L Sun, Z Lian, B Liu, J Tao - Information Fusion, 2024 - Elsevier

Abstract Audio-Visual Emotion Recognition (AVER) has garnered increasing attention in
recent years for its critical role in creating emotion-aware intelligent machines. Previous …

被引用次数：14 相关文章所有 3 个版本

[PDF] utdallas.edu

[PDF][PDF] Versatile audio-visual learning for handling single and multi modalities in emotion regression and classification tasks

L Goncalves, SG Leem, WC Lin, B Sisman… - arXiv preprint arXiv …, 2023 - ecs.utdallas.edu

Most current audio-visual emotion recognition models lack the flexibility needed for
deployment in practical applications. We envision a multimodal system that works even …

被引用次数：13 相关文章所有 2 个版本

[PDF] ieee.org

Selective acoustic feature enhancement for speech emotion recognition with noisy speech

SG Leem, D Fulford, JP Onnela… - … /ACM Transactions on …, 2023 - ieeexplore.ieee.org

A speech emotion recognition (SER) system deployed on a real-world application can
encounter speech contaminated with unconstrained background noise. To deal with this …

被引用次数：8 相关文章所有 2 个版本

[PDF] ieee.org

Versatile audio-visual learning for emotion recognition

L Goncalves, SG Leem, WC Lin… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Most current audio-visual emotion recognition models lack the flexibility needed for
deployment in practical applications. We envision a multimodal system that works even …

被引用次数：2 相关文章所有 2 个版本

[HTML] sciencedirect.com

[HTML][HTML] Deep temporal clustering features for speech emotion recognition

WC Lin, C Busso - Speech Communication, 2024 - Elsevier

Deep clustering is a popular unsupervised technique for feature representation learning. We
recently proposed the chunk-based DeepEmoCluster framework for speech emotion …

被引用次数：3 相关文章所有 3 个版本

[PDF] acm.org

Enhancing Resilience to Missing Data in Audio-Text Emotion Recognition with Multi-Scale Chunk Regularization

WC Lin, L Goncalves, C Busso - … of the 25th International Conference on …, 2023 - dl.acm.org

Most existing audio-text emotion recognition studies have focused on the computational
modeling aspects, including strategies for fusing the modalities. An area that has received …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Detail-Enhanced Intra-and Inter-modal Interaction for Audio-Visual Emotion Recognition

T Shi, X Ge, JM Jose, N Pugeault… - … Conference on Pattern …, 2025 - Springer

Capturing complex temporal relationships between video and audio modalities is vital for
Audio-Visual Emotion Recognition (AVER). However, existing methods lack attention to …