Pengi: An audio language model for audio tasks

S Deshmukh, B Elizalde, R Singh… - Advances in Neural …, 2023 - proceedings.neurips.cc
In the domain of audio processing, Transfer Learning has facilitated the rise of Self-
Supervised Learning and Zero-Shot Learning techniques. These approaches have led to …

Transforming the embeddings: A lightweight technique for speech emotion recognition tasks

OC Phukan, AB Buduru, R Sharma - arXiv preprint arXiv:2305.18640, 2023 - arxiv.org
Speech emotion recognition (SER) is a field that has drawn a lot of attention due to its
applications in diverse fields. A current trend in methods used for SER is to leverage …

StyleCap: Automatic Speaking-Style Captioning from Speech Based on Speech and Language Self-supervised Learning Models

K Yamauchi, Y Ijima, Y Saito - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
We propose StyleCap, a method to generate natural language descriptions of speaking
styles appearing in speech. Although most of conventional techniques for para-/non …

Large Language Model-Based Emotional Speech Annotation Using Context and Acoustic Feature for Speech Emotion Recognition

J Santoso, K Ishizuka… - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
The remarkable emergence of large language models (LLM) and their vast capabilities have
opened a possibility for applications in various fields, including speech emotion recognition …

Solution for Emotion Prediction Competition of Workshop on Emotionally and Culturally Intelligent AI

S Xu, Z Chi, Y Yang - arXiv preprint arXiv:2403.17683, 2024 - arxiv.org
This report provide a detailed description of the method that we explored and proposed in
the WECIA Emotion Prediction Competition (EPC), which predicts a person's emotion …

Interpretable multimodal sentiment analysis based on textual modality descriptions by using large-scale language models

S Li, S Okada - arXiv preprint arXiv:2305.06162, 2023 - arxiv.org
Multimodal sentiment analysis is an important area for understanding the user's internal
states. Deep learning methods were effective, but the problem of poor interpretability has …

Speech Emotion Recognition Based on 1D CNN and MFCC

G Li, Y Liu, X Wang - … IEEE 5th International Conference on Civil …, 2023 - ieeexplore.ieee.org
With the widespread application and growing popularity of speech technology, the field of
speech emotion recognition has garnered significant attention in scientific research. This …

Domain Adaptation for Contrastive Audio-Language Models

S Deshmukh, R Singh, B Raj - arXiv preprint arXiv:2402.09585, 2024 - arxiv.org
Audio-Language Models (ALM) aim to be general-purpose audio models by providing zero-
shot capabilities at test time. The zero-shot performance of ALM improves by using suitable …