Unsupervised speech recognition

A Baevski, WN Hsu, A Conneau… - Advances in Neural …, 2021 - proceedings.neurips.cc
Despite rapid progress in the recent past, current speech recognition systems still require
labeled training data which limits this technology to a small fraction of the languages spoken …

Pushing the limits of semi-supervised learning for automatic speech recognition

Y Zhang, J Qin, DS Park, W Han, CC Chiu… - arXiv preprint arXiv …, 2020 - arxiv.org
We employ a combination of recent developments in semi-supervised learning for automatic
speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled …

[PDF][PDF] Semi-orthogonal low-rank matrix factorization for deep neural networks.

D Povey, G Cheng, Y Wang, K Li, H Xu… - Interspeech, 2018 - academia.edu
Abstract Time Delay Neural Networks (TDNNs), also known as onedimensional
Convolutional Neural Networks (1-d CNNs), are an efficient and well-performing neural …

TED-LIUM 3: Twice as much data and corpus repartition for experiments on speaker adaptation

F Hernandez, V Nguyen, S Ghannay… - Speech and Computer …, 2018 - Springer
In this paper, we present TED-LIUM release 3 corpus (TED-LIUM 3 is available on
https://lium. univ-lemans. fr/ted-lium3/) dedicated to speech recognition in English, which …

Towards end-to-end unsupervised speech recognition

AH Liu, WN Hsu, M Auli… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
Unsupervised speech recognition has shown great potential to make Automatic Speech
Recognition (ASR) systems accessible to every language. However, existing methods still …

Quantifying bias in automatic speech recognition

S Feng, O Kudina, BM Halpern… - arXiv preprint arXiv …, 2021 - arxiv.org
Automatic speech recognition (ASR) systems promise to deliver objective interpretation of
human speech. Practice and recent evidence suggests that the state-of-the-art (SotA) ASRs …

[HTML][HTML] Towards inclusive automatic speech recognition

S Feng, BM Halpern, O Kudina… - Computer Speech & …, 2024 - Elsevier
Practice and recent evidence show that state-of-the-art (SotA) automatic speech recognition
(ASR) systems do not perform equally well for all speaker groups. Many factors can cause …

A pruned rnnlm lattice-rescoring algorithm for automatic speech recognition

H Xu, T Chen, D Gao, Y Wang, K Li… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org
Lattice-rescoring is a common approach to take advantage of recurrent neural language
models in ASR, where a word-lattice is generated from 1st-pass decoding and the lattice is …

Speech enhancement based on teacher–student deep learning using improved speech presence probability for noise-robust speech recognition

YH Tu, J Du, CH Lee - IEEE/ACM Transactions on Audio …, 2019 - ieeexplore.ieee.org
In this paper, we propose a novel teacher-student learning framework for the preprocessing
of a speech recognizer, leveraging the online noise tracking capabilities of improved minima …

Advanced rich transcription system for Estonian speech

T Alumäe, O Tilk - Human language technologies–the Baltic …, 2018 - ebooks.iospress.nl
This paper describes the current TTÜ speech transcription system for Estonian speech. The
system is designed to handle semi-spontaneous speech, such as broadcast conversations …