[PDF][PDF] Recent advances in end-to-end automatic speech recognition
J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
Branchformer: Parallel mlp-attention architectures to capture local and global context for speech recognition and understanding
Conformer has proven to be effective in many speech processing tasks. It combines the
benefits of extracting local dependencies using convolutions and global dependencies …
benefits of extracting local dependencies using convolutions and global dependencies …
End-to-end speech recognition: A survey
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …
learning has brought considerable reductions in word error rate of more than 50% relative …
Gigaspeech: An evolving, multi-domain asr corpus with 10,000 hours of transcribed audio
This paper introduces GigaSpeech, an evolving, multi-domain English speech recognition
corpus with 10,000 hours of high quality labeled audio suitable for supervised training, and …
corpus with 10,000 hours of high quality labeled audio suitable for supervised training, and …
Wenetspeech: A 10000+ hours multi-domain mandarin corpus for speech recognition
In this paper, we present WenetSpeech, a multi-domain Mandarin corpus consisting of
10000+ hours high-quality labeled speech, 2400+ hours weakly labeled speech, and about …
10000+ hours high-quality labeled speech, 2400+ hours weakly labeled speech, and about …
Squeezeformer: An efficient transformer for automatic speech recognition
The recently proposed Conformer model has become the de facto backbone model for
various downstream speech tasks based on its hybrid attention-convolution architecture that …
various downstream speech tasks based on its hybrid attention-convolution architecture that …
E-branchformer: Branchformer with enhanced merging for speech recognition
Conformer, combining convolution and self-attention sequentially to capture both local and
global information, has shown remarkable performance and is currently regarded as the …
global information, has shown remarkable performance and is currently regarded as the …
Opencpop: A high-quality open source chinese popular song corpus for singing voice synthesis
This paper introduces Opencpop, a publicly available high-quality Mandarin singing corpus
designed for singing voice synthesis (SVS). The corpus consists of 100 popular Mandarin …
designed for singing voice synthesis (SVS). The corpus consists of 100 popular Mandarin …
An exploration of self-supervised pretrained representations for end-to-end speech recognition
Self-supervised pretraining on speech data has achieved a lot of progress. High-fidelity
representation of the speech signal is learned from a lot of untranscribed data and shows …
representation of the speech signal is learned from a lot of untranscribed data and shows …
The singing voice conversion challenge 2023
We present the latest iteration of the voice conversion challenge (VCC) series, a bi-annual
scientific event aiming to compare and understand different voice conversion (VC) systems …
scientific event aiming to compare and understand different voice conversion (VC) systems …