Deep learning techniques for speech emotion recognition, from databases to models
The advancements in neural networks and the on-demand need for accurate and near real-
time Speech Emotion Recognition (SER) in human–computer interactions make it …
time Speech Emotion Recognition (SER) in human–computer interactions make it …
A systematic literature review of speech emotion recognition approaches
YB Singh, S Goel - Neurocomputing, 2022 - Elsevier
Nowadays emotion recognition from speech (SER) is a demanding research area for
researchers because of its wide real-life applications. There are many challenges for SER …
researchers because of its wide real-life applications. There are many challenges for SER …
EMO: Emote Portrait Alive Generating Expressive Portrait Videos with Audio2Video Diffusion Model Under Weak Conditions
In this work, we tackle the challenge of enhancing the realism and expressiveness in talking
head video generation by focusing on the dynamic and nuanced relationship between audio …
head video generation by focusing on the dynamic and nuanced relationship between audio …
Emoca: Emotion driven monocular face capture and animation
As 3D facial avatars become more widely used for communication, it is critical that they
faithfully convey emotion. Unfortunately, the best recent methods that regress parametric 3D …
faithfully convey emotion. Unfortunately, the best recent methods that regress parametric 3D …
Balanced multimodal learning via on-the-fly gradient modulation
Audio-visual learning helps to comprehensively understand the world, by integrating
different senses. Accordingly, multiple input modalities are expected to boost model …
different senses. Accordingly, multiple input modalities are expected to boost model …
Diffused heads: Diffusion models beat gans on talking-face generation
M Stypułkowski, K Vougioukas, S He… - Proceedings of the …, 2024 - openaccess.thecvf.com
Talking face generation has historically struggled to produce head movements and natural
facial expressions without guidance from additional reference videos. Recent developments …
facial expressions without guidance from additional reference videos. Recent developments …
Eamm: One-shot emotional talking face via audio-based emotion-aware motion model
Although significant progress has been made to audio-driven talking face generation,
existing methods either neglect facial emotion or cannot be applied to arbitrary subjects. In …
existing methods either neglect facial emotion or cannot be applied to arbitrary subjects. In …
Bigssl: Exploring the frontier of large-scale semi-supervised learning for automatic speech recognition
We summarize the results of a host of efforts using giant automatic speech recognition (ASR)
models pre-trained using large, diverse unlabeled datasets containing approximately a …
models pre-trained using large, diverse unlabeled datasets containing approximately a …
Mead: A large-scale audio-visual dataset for emotional talking-face generation
The synthesis of natural emotional reactions is an essential criterion in vivid talking-face
video generation. This criterion is nevertheless seldom taken into consideration in previous …
video generation. This criterion is nevertheless seldom taken into consideration in previous …
The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American …
SR Livingstone, FA Russo - PloS one, 2018 - journals.plos.org
The RAVDESS is a validated multimodal database of emotional speech and song. The
database is gender balanced consisting of 24 professional actors, vocalizing lexically …
database is gender balanced consisting of 24 professional actors, vocalizing lexically …