Amharic spoken digits recognition using convolutional neural network
Spoken digits recognition (SDR) is a type of supervised automatic speech recognition, which
is required in various human–machine interaction applications. It is utilized in phone-based …
is required in various human–machine interaction applications. It is utilized in phone-based …
Cross-Corpus Multilingual Speech Emotion Recognition: Amharic vs. Other Languages
In a conventional speech emotion recognition (SER) task, a classifier for a given language is
trained on a pre-existing dataset for that same language. However, where training data for a …
trained on a pre-existing dataset for that same language. However, where training data for a …
[HTML][HTML] MPSA-Conformer-CTC/Attention: A High-Accuracy, Low-Complexity End-to-End Approach for Tibetan Speech Recognition
C Wu, H Sun, K Huang, L Wu - Sensors, 2024 - mdpi.com
This study addresses the challenges of low accuracy and high computational demands in
Tibetan speech recognition by investigating the application of end-to-end networks. We …
Tibetan speech recognition by investigating the application of end-to-end networks. We …
MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation
Recent advances in video diffusion models have unlocked new potential for realistic audio-
driven talking video generation. However, achieving seamless audio-lip synchronization …
driven talking video generation. However, achieving seamless audio-lip synchronization …
EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark
Speech emotion recognition (SER) is an important part of human-computer interaction,
receiving extensive attention from both industry and academia. However, the current …
receiving extensive attention from both industry and academia. However, the current …
Speech-Based Emotion Analysis Using Log-Mel Spectrograms and MFCC Features
AK Yetkin, H Köse - 2023 31st Signal Processing and …, 2023 - ieeexplore.ieee.org
This study proposes a method for recognizing emotions from speech using Mel
spectrograms and MFCC features which capture the spectral features of speech signals. To …
spectrograms and MFCC features which capture the spectral features of speech signals. To …
[PDF][PDF] A SPECTROGRAM IMAGE AND SPECTRAL COEFFICIENTS ASSISTED VECTOR QUANTIZATION FOR AMHARIC SPEECH EMOTION RECOGNITION
AB DAGIM - 2024 - ir.bdu.edu.et
Speech emotion recognition (SER) is focused on predicting emotion from speech data,
irrespective of the semantic contents. However, it is pertinent to note that variability in …
irrespective of the semantic contents. However, it is pertinent to note that variability in …
Dialect-Based Noisy Speech Dataset, Pre-Processing Tools, and Recognition Models for Amharic
TT Assfaw, T Abebe, B Yalew… - … on Information and …, 2022 - ieeexplore.ieee.org
Nowadays, to facilitate deep learning for voice recognition, a large number of dialect-based
speech dataset for different languages have been produced. However, not such more …
speech dataset for different languages have been produced. However, not such more …
Speech Emotion Recognition: Recent Advances and Current Trends
S Belkacem - International Conference on Artificial Intelligence and …, 2023 - Springer
Speech emotion recognition has become an important endeavor in Human-computer-
Interaction. Emotion recognition and understanding by computer are used for medical …
Interaction. Emotion recognition and understanding by computer are used for medical …
MEMO: Memory-Guided and Emotion-Aware Talking Video Generation
Advances in video diffusion models have unlocked the potential for realistic audio-driven
talking video generation. However, it is still highly challenging to ensure seamless audio-lip …
talking video generation. However, it is still highly challenging to ensure seamless audio-lip …