How might we create better benchmarks for speech recognition?

A Aksënova, D van Esch, J Flynn… - Proceedings of the 1st …, 2021 - aclanthology.org
The applications of automatic speech recognition (ASR) systems are proliferating, in part
due to recent significant quality improvements. However, as recent work indicates, even …

Automatic disfluency detection from untranscribed speech

A Romana, K Koishida… - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org
Speech disfluencies, such as filled pauses or repetitions, are disruptions in the typical flow of
speech. All speakers experience disfluencies at times, and the rate at which we produce …

Analysis and tuning of a voice assistant system for dysfluent speech

V Mitra, Z Huang, C Lea, L Tooley, S Wu… - arXiv preprint arXiv …, 2021 - arxiv.org
Dysfluencies and variations in speech pronunciation can severely degrade speech
recognition performance, and for many individuals with moderate-to-severe speech …

Enhancing asr for stuttered speech with limited data using detect and pass

O Shonibare, X Tong, V Ravichandran - arXiv preprint arXiv:2202.05396, 2022 - arxiv.org
It is estimated that around 70 million people worldwide are affected by a speech disorder
called stuttering. With recent advances in Automatic Speech Recognition (ASR), voice …

[PDF][PDF] End-to-End Spontaneous Speech Recognition Using Disfluency Labeling.

K Horii, M Fukuda, K Ohta, R Nishimura, A Ogawa… - Interspeech, 2022 - isca-archive.org
Spontaneous speech often contains disfluent acoustic features such as fillers and
hesitations, which are major causes of errors during automatic speech recognition (ASR). In …

[PDF][PDF] Whister: Using whisper's representations for stuttering detection

V Changawala, F Rudzicz - Interspeech, 2024 - isca-archive.org
In this paper, we empirically investigate the influence of different factors on the performance
of dysfluency detection. Specifically, we examine the impact of data splits, data quality, and …

[PDF][PDF] Frame-Level Stutter Detection.

JB Harvill, M Hasegawa-Johnson, CD Yoo - INTERSPEECH, 2022 - isca-archive.org
Previous studies on the detection of stuttered speech have focused on classification at the
utterance level (eg, for speech therapy applications), and on the correct insertion of stutter …

[PDF][PDF] On disfluency and non-lexical sound labeling for end-to-end automatic speech recognition

P Mihajlik, Y Meng, MS Kadar, J Linke, B Schuppler… - Interspeech, 2024 - isca-archive.org
Spontaneous speech contains a significant amount of disfluencies and non-lexical sounds
(eg, backchannels, filled pauses), which are often difficult to transcribe. Disfluency labeling …

[PDF][PDF] Survey: Exploring disfluencies for speech-to-speech machine translation

R Kundu, P Jyothi, P Bhattacharyya - 2022 - cfilt.iitb.ac.in
Disfluencies that appear in the transcriptions from automatic speech recognition systems
tend to impair the performance of downstream NLP tasks like machine translation …

Efficient Recognition and Classification of Stuttered Word from Speech Signal using Deep Learning Technique

K Murugan, NK Cherukuri… - 2022 IEEE World …, 2022 - ieeexplore.ieee.org
Fluency is a metric that assesses how well a speaker communicates with another person
while presenting the information. Stuttering is one of the fluency problems that have a …