Classification of vocal intensity category from speech using the wav2vec2 and whisper embeddings

L Zhang, N Jiang, Q Wang, Y Li, Q Lu, L Xie - Speech Communication, 2024 - Elsevier

Trained on 680,000 h of massive speech data, Whisper is a multitasking, multilingual
speech foundation model demonstrating superior performance in automatic speech …

[HTML][HTML] AVID: A speech database for machine learning studies on vocal intensity

P Alku, M Kodali, L Laaksonen, SR Kadiri - Speech Communication, 2024 - Elsevier

Vocal intensity, which is quantified typically with the sound pressure level (SPL), is a key
feature of speech. To measure SPL from speech recordings, a standard calibration tone …

Whisper in Focus: Enhancing Stuttered Speech Classification with Encoder Layer Optimization

H Ameer, S Latif, R Latif, S Mukhtar - arXiv preprint arXiv:2311.05203, 2023 - arxiv.org

In recent years, advancements in the field of speech processing have led to cutting-edge
deep learning algorithms with immense potential for real-world applications. The automated …

被引用次数：2 相关文章所有 2 个版本

[PDF] aalto.fi

[PDF][PDF] Deep Learning for Automatic Classification of Speech Intensity Modes

L Ansari - 2023 - aaltodoc.aalto.fi

One of the fundamental phenomena in speech processing is speech intensity. As a concept,
speech intensity and its regulation help capture various aspects as well as changes in the …

Whisper-Sv: Adapting Whisper for Low-Resource Speaker Verification

L Zhang, N Jiang, Q Wang, Y Li, Q Lu, L Xie - Available at SSRN 4725202 - papers.ssrn.com

Recent advancements in automatic speech recognition (ASR), exemplified by Whisper, have
demonstrated the potential of these systems to approach human-level performance given …