Speech emotion recognition using 3d convolutions and attention-based sliding recurrent networks with auditory front-ends

Z Peng, X Li, Z Zhu, M Unoki, J Dang, M Akagi - IEEE Access, 2020 - ieeexplore.ieee.org
Emotion information from speech can effectively help robots understand speaker's intentions
in natural human-robot interaction. The human auditory system can easily track temporal …

Multi-resolution modulation-filtered cochleagram feature for LSTM-based dimensional emotion recognition from speech

Z Peng, J Dang, M Unoki, M Akagi - Neural Networks, 2021 - Elsevier
Continuous dimensional emotion recognition from speech helps robots or virtual agents
capture the temporal dynamics of a speaker's emotional state in natural human–robot …

Robust voice activity detection using an auditory-inspired masked modulation encoder based convolutional attention network

N Li, L Wang, M Ge, M Unoki, S Li, J Dang - Speech Communication, 2024 - Elsevier
Deep learning has revolutionized voice activity detection (VAD) by offering promising
solutions. However, directly applying traditional features, such as raw waveforms and Mel …

Relationship between contributions of temporal amplitude envelope of speech and modulation transfer function in room acoustics to perception of noise-vocoded …

M Unoki, Z Zhu - Acoustical Science and Technology, 2020 - jstage.jst.go.jp
Speech signals can be represented as a sum of amplitude-modulated frequency bands. This
sum can also be regarded as a temporal amplitude envelope (TAE) with temporal fine …

Envelope estimation using geometric properties of a discrete real signal

CHT Santos, V Pereira - Digital Signal Processing, 2022 - Elsevier
Despite being an elusive concept, the temporal amplitude envelope of a signal is essential
for its complete characterization, being the primary information-carrying medium in spoken …

Increasing speech intelligibility and naturalness in noise based on concepts of modulation spectrum and modulation transfer function

T Ngo, R Kubo, M Akagi - Speech Communication, 2021 - Elsevier
This study focuses on identifying effective features for controlling speech to increase speech
intelligibility under adverse conditions. Previous approaches either cancel noise throughout …

Contribution of modulation spectral features on the perception of vocal-emotion using noise-vocoded speech

Z Zhu, R Miyauchi, Y Araki, M Unoki - Acoustical Science and …, 2018 - jstage.jst.go.jp
Previous studies on noise-vocoded speech showed that the temporal modulation cues
provided by the temporal envelope play an important role in the perception of vocal emotion …

[HTML][HTML] Enhancing Dimensional Emotion Recognition from Speech through Modulation-Filtered Cochleagram and Parallel Attention Recurrent Network

Z Peng, H Zeng, Y Li, Y Du, J Dang - Electronics, 2023 - mdpi.com
Dimensional emotion can better describe rich and fine-grained emotional states than
categorical emotion. In the realm of human–robot interaction, the ability to continuously …

[HTML][HTML] Contribution of common modulation spectral features to vocal-emotion recognition of noise-vocoded speech in noisy reverberant environments

T Guo, Z Zhu, S Kidani, M Unoki - Applied Sciences, 2022 - mdpi.com
In one study on vocal emotion recognition using noise-vocoded speech (NVS), the high
similarities between modulation spectral features (MSFs) and the results of vocal-emotion …

A study of salient modulation domain features for speaker identification

SW McKnight, AOT Hogg, VW Neo… - 2021 Asia-Pacific …, 2021 - ieeexplore.ieee.org
This paper studies the ranges of acoustic and modulation frequencies of speech most
relevant for identifying speakers and compares the speaker-specific information present in …