wav2vec: Unsupervised pre-training for speech recognition

S Schneider, A Baevski, R Collobert, M Auli - arXiv preprint arXiv …, 2019 - arxiv.org
We explore unsupervised pre-training for speech recognition by learning representations of
raw audio. wav2vec is trained on large amounts of unlabeled audio data and the resulting …

Libri-light: A benchmark for asr with limited or no supervision

J Kahn, M Riviere, W Zheng… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
We introduce a new collection of spoken English audio suitable for training speech
recognition systems under limited or no supervision. It is derived from open-source audio …

Speechbert: An audio-and-text jointly learned language model for end-to-end spoken question answering

YS Chuang, CL Liu, HY Lee, L Lee - arXiv preprint arXiv:1910.11559, 2019 - arxiv.org
While various end-to-end models for spoken language understanding tasks have been
explored recently, this paper is probably the first known attempt to challenge the very difficult …

Almost unsupervised text to speech and automatic speech recognition

Y Ren, X Tan, T Qin, S Zhao… - … on machine learning, 2019 - proceedings.mlr.press
Text to speech (TTS) and automatic speech recognition (ASR) are two dual tasks in speech
processing and both achieve impressive performance thanks to the recent advance in deep …

Speech-Text Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment

T Yu, H Gao, TE Lin, M Yang, Y Wu, W Ma… - Proceedings of the …, 2023 - aclanthology.org
Recently, speech-text pre-training methods have shown remarkable success in many
speech and natural language processing tasks. However, most previous pre-trained models …

Completely unsupervised speech recognition by a generative adversarial network harmonized with iteratively refined hidden markov models

KY Chen, CP Tsai, DR Liu, HY Lee, L Lee - arXiv preprint arXiv …, 2019 - arxiv.org
Producing a large annotated speech corpus for training ASR systems remains difficult for
more than 95% of languages all over the world which are low-resourced, but collecting a …

Leveraging pre-trained representations to improve access to untranscribed speech from endangered languages

N San, M Bartelds, M Browne, L Clifford… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
Pre-trained speech representations like wav2vec 2.0 are a powerful tool for automatic
speech recognition (ASR). Yet many endangered languages lack sufficient data for pre …

End-to-End speech recognition models for a low-resourced Indonesian Language

S Suyanto, A Arifianto, A Sirwan… - 2020 8th International …, 2020 - ieeexplore.ieee.org
Recent automatic speech recognition (ASR) is commonly developed using deep learning
(DL), instead of the Hidden Markov Model (HMM). Many researchers show that DL is much …

Speech-Text Dialog Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment

T Yu, H Gao, TE Lin, M Yang, Y Wu, W Ma… - arXiv preprint arXiv …, 2023 - arxiv.org
Recently, speech-text pre-training methods have shown remarkable success in many
speech and natural language processing tasks. However, most previous pre-trained models …

Syllable-Based Indonesian Automatic Speech Recognition.

DH Galatang - International Journal on Electrical …, 2020 - search.ebscohost.com
The syllable-based automatic speech recognition (ASR) systems commonly perform better
than the phoneme-based ones. This paper focuses on developing an Indonesian …