Survey of deep learning paradigms for speech processing

KB Bhangale, M Kothandaraman - Wireless Personal Communications, 2022 - Springer
Over the past decades, a particular focus is given to research on machine learning
techniques for speech processing applications. However, in the past few years, research …

A comprehensive survey on multi-modal conversational emotion recognition with deep learning

Y Shou, T Meng, W Ai, N Yin, K Li - arXiv preprint arXiv:2312.05735, 2023 - arxiv.org
Multi-modal conversation emotion recognition (MCER) aims to recognize and track the
speaker's emotional state using text, speech, and visual information in the conversation …

Curriculum learning: A survey

P Soviany, RT Ionescu, P Rota, N Sebe - International Journal of …, 2022 - Springer
Training machine learning models in a meaningful order, from the easy samples to the hard
ones, using curriculum learning can provide performance improvements over the standard …

AVEC 2019 workshop and challenge: state-of-mind, detecting depression with AI, and cross-cultural affect recognition

F Ringeval, B Schuller, M Valstar, N Cummins… - Proceedings of the 9th …, 2019 - dl.acm.org
The Audio/Visual Emotion Challenge and Workshop (AVEC 2019)'State-of-Mind, Detecting
Depression with AI, and Cross-cultural Affect Recognition'is the ninth competition event …

x-vectors meet emotions: A study on dependencies between emotion and speaker recognition

R Pappagari, T Wang, J Villalba… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
In this work, we explore the dependencies between speaker recognition and emotion
recognition. We first show that knowledge learned for speaker recognition can be reused for …

[PDF][PDF] Self-Attention for Speech Emotion Recognition.

L Tarantino, PN Garner, A Lazaridis - Interspeech, 2019 - publications.idiap.ch
Abstract Speech Emotion Recognition (SER) has been shown to benefit from many of the
recent advances in deep learning, including recurrent based and attention based neural …

Expressive TTS training with frame and style reconstruction loss

R Liu, B Sisman, G Gao, H Li - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org
We propose a novel training strategy for Tacotron-based text-to-speech (TTS) system that
improves the speech styling at utterance level. One of the key challenges in prosody …

Adaptive curriculum learning

Y Kong, L Liu, J Wang, D Tao - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Inspired by the human learning principle that learning easier concepts first and then
gradually paying more attention to harder ones, curriculum learning uses the non-uniform …

Multi-classifier interactive learning for ambiguous speech emotion recognition

Y Zhou, X Liang, Y Gu, Y Yin… - IEEE/ACM transactions on …, 2022 - ieeexplore.ieee.org
In recent years, speech emotion recognition technology is of great significance in
widespread applications such as call centers, social robots and health care. Thus, the …

EmoBed: Strengthening monomodal emotion recognition via training with crossmodal emotion embeddings

J Han, Z Zhang, Z Ren… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org
Despite remarkable advances in emotion recognition, they are severely restrained from
either the essentially limited property of the employed single modality, or the synchronous …