Speech emotion recognition using deep 1D & 2D CNN LSTM networks

J Zhao, X Mao, L Chen - Biomedical signal processing and control, 2019 - Elsevier
We aimed at learning deep emotion features to recognize speech emotion. Two
convolutional neural network and long short-term memory (CNN LSTM) networks, one 1D …

Learning deep features to recognise speech emotion using merged deep CNN

J Zhao, X Mao, L Chen - IET Signal Processing, 2018 - Wiley Online Library
This study aims at learning deep features from different data to recognise speech emotion.
The authors designed a merged convolutional neural network (CNN), which had two …

Exploiting the potentialities of features for speech emotion recognition

D Li, Y Zhou, Z Wang, D Gao - Information Sciences, 2021 - Elsevier
In recent years, studies on speech signals have increasingly paid attention to emotional
information. The most challenging aspect in speech emotion recognition (SER) is choosing …

Discriminatively trained recurrent neural networks for continuous dimensional emotion recognition from audio

F Weninger, F Ringeval, E Marchi, B Schuller - 2016 - opus.bibliothek.uni-augsburg.de
Continuous dimensional emotion recognition from audio is a sequential regression problem,
where the goal is to maximize correlation between sequences of regression outputs and …

On-line continuous-time music mood regression with deep recurrent neural networks

F Weninger, F Eyben, B Schuller - 2014 IEEE international …, 2014 - ieeexplore.ieee.org
This paper proposes a novel machine learning approach for the task of on-line continuous-
time music mood regression, ie, low-latency prediction of the time-varying arousal and …

[PDF][PDF] Musical audio synthesis using autoencoding neural nets

A Sarroff, MA Casey - … of the International Society for Music …, 2014 - research.gold.ac.uk
With an optimal network topology and tuning of hyperpa-rameters, artificial neural networks
(ANNs) may be trained to learn a mapping from low level audio features to one or more …

A systematic evaluation of the bag-of-frames representation for music information retrieval

L Su, CCM Yeh, JY Liu, JC Wang… - IEEE Transactions on …, 2014 - ieeexplore.ieee.org
There has been an increasing attention on learning feature representations from the
complex, high-dimensional audio data applied in various music information retrieval (MIR) …

Understanding affective content of music videos through learned representations

E Acar, F Hopfgartner, S Albayrak - … MMM 2014, Dublin, Ireland, January 6 …, 2014 - Springer
In consideration of the ever-growing available multimedia data, annotating multimedia
content automatically with feeling (s) expected to arise in users is a challenging problem. In …

Emotion recognition from thermal infrared images using deep Boltzmann machine

S Wang, M He, Z Gao, S He, Q Ji - Frontiers of Computer Science, 2014 - Springer
Facial expression and emotion recognition from thermal infrared images has attracted more
and more attentions in recent years. However, the features adopted in current work are …

Recognizing semi-natural and spontaneous speech emotions using deep neural networks

A Amjad, L Khan, N Ashraf, MB Mahmood… - IEEE …, 2022 - ieeexplore.ieee.org
We needed to find deep emotional features to identify emotions from audio signals.
Identifying emotions in spontaneous speech is a novel and challenging subject of research …