wav2vec: Unsupervised pre-training for speech recognition
We explore unsupervised pre-training for speech recognition by learning representations of
raw audio. wav2vec is trained on large amounts of unlabeled audio data and the resulting …
raw audio. wav2vec is trained on large amounts of unlabeled audio data and the resulting …
Libri-light: A benchmark for asr with limited or no supervision
We introduce a new collection of spoken English audio suitable for training speech
recognition systems under limited or no supervision. It is derived from open-source audio …
recognition systems under limited or no supervision. It is derived from open-source audio …
Speechbert: An audio-and-text jointly learned language model for end-to-end spoken question answering
While various end-to-end models for spoken language understanding tasks have been
explored recently, this paper is probably the first known attempt to challenge the very difficult …
explored recently, this paper is probably the first known attempt to challenge the very difficult …
Almost unsupervised text to speech and automatic speech recognition
Text to speech (TTS) and automatic speech recognition (ASR) are two dual tasks in speech
processing and both achieve impressive performance thanks to the recent advance in deep …
processing and both achieve impressive performance thanks to the recent advance in deep …
Speech-Text Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment
Recently, speech-text pre-training methods have shown remarkable success in many
speech and natural language processing tasks. However, most previous pre-trained models …
speech and natural language processing tasks. However, most previous pre-trained models …
Completely unsupervised speech recognition by a generative adversarial network harmonized with iteratively refined hidden markov models
Producing a large annotated speech corpus for training ASR systems remains difficult for
more than 95% of languages all over the world which are low-resourced, but collecting a …
more than 95% of languages all over the world which are low-resourced, but collecting a …
Leveraging pre-trained representations to improve access to untranscribed speech from endangered languages
Pre-trained speech representations like wav2vec 2.0 are a powerful tool for automatic
speech recognition (ASR). Yet many endangered languages lack sufficient data for pre …
speech recognition (ASR). Yet many endangered languages lack sufficient data for pre …
End-to-End speech recognition models for a low-resourced Indonesian Language
S Suyanto, A Arifianto, A Sirwan… - 2020 8th International …, 2020 - ieeexplore.ieee.org
Recent automatic speech recognition (ASR) is commonly developed using deep learning
(DL), instead of the Hidden Markov Model (HMM). Many researchers show that DL is much …
(DL), instead of the Hidden Markov Model (HMM). Many researchers show that DL is much …
Speech-Text Dialog Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment
Recently, speech-text pre-training methods have shown remarkable success in many
speech and natural language processing tasks. However, most previous pre-trained models …
speech and natural language processing tasks. However, most previous pre-trained models …
Syllable-Based Indonesian Automatic Speech Recognition.
DH Galatang - International Journal on Electrical …, 2020 - search.ebscohost.com
The syllable-based automatic speech recognition (ASR) systems commonly perform better
than the phoneme-based ones. This paper focuses on developing an Indonesian …
than the phoneme-based ones. This paper focuses on developing an Indonesian …