Almost-unsupervised speech recognition with close-to-zero resource based on phonetic structures...

S Schneider, A Baevski, R Collobert, M Auli - arXiv preprint arXiv …, 2019 - arxiv.org

We explore unsupervised pre-training for speech recognition by learning representations of
raw audio. wav2vec is trained on large amounts of unlabeled audio data and the resulting …

被引用次数：1458 相关文章所有 12 个版本

[PDF] arxiv.org

Libri-light: A benchmark for asr with limited or no supervision

J Kahn, M Riviere, W Zheng… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

We introduce a new collection of spoken English audio suitable for training speech
recognition systems under limited or no supervision. It is derived from open-source audio …

被引用次数：577 相关文章所有 13 个版本

[PDF] arxiv.org

Speechbert: An audio-and-text jointly learned language model for end-to-end spoken question answering

YS Chuang, CL Liu, HY Lee, L Lee - arXiv preprint arXiv:1910.11559, 2019 - arxiv.org

While various end-to-end models for spoken language understanding tasks have been
explored recently, this paper is probably the first known attempt to challenge the very difficult …

被引用次数：122 相关文章所有 6 个版本

[PDF] mlr.press

Almost unsupervised text to speech and automatic speech recognition

Y Ren, X Tan, T Qin, S Zhao… - … on machine learning, 2019 - proceedings.mlr.press

Text to speech (TTS) and automatic speech recognition (ASR) are two dual tasks in speech
processing and both achieve impressive performance thanks to the recent advance in deep …

被引用次数：118 相关文章所有 7 个版本

[PDF] aclanthology.org

Speech-Text Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment

T Yu, H Gao, TE Lin, M Yang, Y Wu, W Ma… - Proceedings of the …, 2023 - aclanthology.org

Recently, speech-text pre-training methods have shown remarkable success in many
speech and natural language processing tasks. However, most previous pre-trained models …

被引用次数：9 相关文章所有 2 个版本

[PDF] arxiv.org

Completely unsupervised speech recognition by a generative adversarial network harmonized with iteratively refined hidden markov models

KY Chen, CP Tsai, DR Liu, HY Lee, L Lee - arXiv preprint arXiv …, 2019 - arxiv.org

Producing a large annotated speech corpus for training ASR systems remains difficult for
more than 95% of languages all over the world which are low-resourced, but collecting a …

被引用次数：42 相关文章所有 6 个版本

[PDF] arxiv.org

Leveraging pre-trained representations to improve access to untranscribed speech from endangered languages

N San, M Bartelds, M Browne, L Clifford… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org

Pre-trained speech representations like wav2vec 2.0 are a powerful tool for automatic
speech recognition (ASR). Yet many endangered languages lack sufficient data for pre …

被引用次数：14 相关文章所有 7 个版本

End-to-End speech recognition models for a low-resourced Indonesian Language

S Suyanto, A Arifianto, A Sirwan… - 2020 8th International …, 2020 - ieeexplore.ieee.org

Recent automatic speech recognition (ASR) is commonly developed using deep learning
(DL), instead of the Hidden Markov Model (HMM). Many researchers show that DL is much …

被引用次数：17 相关文章

[PDF] arxiv.org

Speech-Text Dialog Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment

T Yu, H Gao, TE Lin, M Yang, Y Wu, W Ma… - arXiv preprint arXiv …, 2023 - arxiv.org

Recently, speech-text pre-training methods have shown remarkable success in many
speech and natural language processing tasks. However, most previous pre-trained models …

被引用次数：2 相关文章所有 3 个版本

Syllable-Based Indonesian Automatic Speech Recognition.

DH Galatang - International Journal on Electrical …, 2020 - search.ebscohost.com

The syllable-based automatic speech recognition (ASR) systems commonly perform better
than the phoneme-based ones. This paper focuses on developing an Indonesian …

被引用次数：4 相关文章所有 2 个版本