Leveraging speech ptm, text llm, and emotional tts for speech emotion recognition

M Jin, S Wang, L Ma, Z Chu, JY Zhang, X Shi… - arXiv preprint arXiv …, 2023 - arxiv.org

Time series forecasting holds significant importance in many real-world dynamic systems
and has been extensively studied. Unlike natural language process (NLP) and computer …

被引用次数：358 相关文章所有 8 个版本

Generative technology for human emotion recognition: A scoping review

F Ma, Y Yuan, Y Xie, H Ren, I Liu, Y He, F Ren, FR Yu… - Information …, 2024 - Elsevier

Affective computing stands at the forefront of artificial intelligence (AI), seeking to imbue
machines with the ability to comprehend and respond to human emotions. Central to this …

被引用次数：2 相关文章

[PDF] arxiv.org

emotion2vec: Self-supervised pre-training for speech emotion representation

Z Ma, Z Zheng, J Ye, J Li, Z Gao, S Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

We propose emotion2vec, a universal speech emotion representation model. emotion2vec
is pre-trained on open-source unlabeled emotion data through self-supervised online …

被引用次数：56 相关文章所有 2 个版本

[PDF] arxiv.org

Fast-Hubert: an Efficient Training Framework for Self-Supervised Speech Representation Learning

G Yang, Z Ma, Z Zheng, Y Song, Z Niu… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

Recent years have witnessed significant advancements in self-supervised learning (SSL)
methods for speech-processing tasks. Various speech-based SSL models have been …

被引用次数：7 相关文章所有 3 个版本

[HTML] mdpi.com

[HTML][HTML] Speech emotion recognition using dual-stream representation and cross-attention fusion

S Yu, J Meng, W Fan, Y Chen, B Zhu, H Yu, Y Xie… - Electronics, 2024 - mdpi.com

Speech emotion recognition (SER) aims to recognize human emotions through in-depth
analysis of audio signals. However, it remains challenging to encode emotional cues and to …

被引用次数：3 相关文章

[PDF] wiley.com

Improving Teacher Training Through Emotion Recognition and Data Fusion

M Albaladejo‐González, R Gaspar‐Marco… - Expert …, 2024 - Wiley Online Library

The quality of education hinges on the proficiency and training of educators. Due to the
importance of teacher training, the innovative platform Teacher Moments creates simulated …

A Subconvolutional U-net with Gated Recurrent Unit and Efficient Channel Attention Mechanism for Real-Time Speech Enhancement

S Yechuri, S Vanambathina - Wireless Personal Communications, 2024 - Springer

We propose a subconvolutional U-net with a gated recurrent unit and an efficient channel
attention mechanism for real-time speech enhancement. The subconvolutional U-net (SCU …

被引用次数：3 相关文章

Gradient-Level Differential Privacy Against Attribute Inference Attack for Speech Emotion Recognition

H Chen, H Zhao, Z Zhang - IEEE Signal Processing Letters, 2024 - ieeexplore.ieee.org

The Federated Learning (FL) paradigm for distributed privacy preservation is valued for its
ability to collaboratively train Speech Emotion Recognition (SER) models while keeping …

[PDF] arxiv.org

Efficient VoIP Communications through LLM-based Real-Time Speech Reconstruction and Call Prioritization for Emergency Services

D Venkateshperumal, RA Rafi, S Ahmed… - arXiv preprint arXiv …, 2024 - arxiv.org

Emergency communication systems face disruptions due to packet loss, bandwidth
constraints, poor signal quality, delays, and jitter in VoIP systems, leading to degraded real …

[PDF] arxiv.org

Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap

G Yang, F Yu, Z Ma, Z Du, Z Gao, S Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

While automatic speech recognition (ASR) systems have achieved remarkable performance
with large-scale datasets, their efficacy remains inadequate in low-resource settings …