- 学术资源搜索

Pengi: An audio language model for audio tasks

S Deshmukh, B Elizalde, R Singh… - Advances in Neural …, 2023 - proceedings.neurips.cc

In the domain of audio processing, Transfer Learning has facilitated the rise of Self-
Supervised Learning and Zero-Shot Learning techniques. These approaches have led to …

被引用次数：54 相关文章所有 5 个版本

[PDF] arxiv.org

Transforming the embeddings: A lightweight technique for speech emotion recognition tasks

OC Phukan, AB Buduru, R Sharma - arXiv preprint arXiv:2305.18640, 2023 - arxiv.org

Speech emotion recognition (SER) is a field that has drawn a lot of attention due to its
applications in diverse fields. A current trend in methods used for SER is to leverage …

被引用次数：4 相关文章所有 4 个版本

[PDF] arxiv.org

StyleCap: Automatic Speaking-Style Captioning from Speech Based on Speech and Language Self-supervised Learning Models

K Yamauchi, Y Ijima, Y Saito - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org

We propose StyleCap, a method to generate natural language descriptions of speaking
styles appearing in speech. Although most of conventional techniques for para-/non …

被引用次数：2 相关文章所有 5 个版本

Large Language Model-Based Emotional Speech Annotation Using Context and Acoustic Feature for Speech Emotion Recognition

J Santoso, K Ishizuka… - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org

The remarkable emergence of large language models (LLM) and their vast capabilities have
opened a possibility for applications in various fields, including speech emotion recognition …

[PDF] arxiv.org

Domain Adaptation for Contrastive Audio-Language Models

S Deshmukh, R Singh, B Raj - arXiv preprint arXiv:2402.09585, 2024 - arxiv.org

Audio-Language Models (ALM) aim to be general-purpose audio models by providing zero-
shot capabilities at test time. The zero-shot performance of ALM improves by using suitable …

Pengi: An audio language model for audio tasks

Transforming the embeddings: A lightweight technique for speech emotion recognition tasks

StyleCap: Automatic Speaking-Style Captioning from Speech Based on Speech and Language Self-supervised Learning Models

Large Language Model-Based Emotional Speech Annotation Using Context and Acoustic Feature for Speech Emotion Recognition

Solution for Emotion Prediction Competition of Workshop on Emotionally and Culturally Intelligent AI

Interpretable multimodal sentiment analysis based on textual modality descriptions by using large-scale language models

Speech Emotion Recognition Based on 1D CNN and MFCC

Domain Adaptation for Contrastive Audio-Language Models

高级搜索

引用