[PDF][PDF] Recent advances in end-to-end automatic speech recognition
J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
Recent developments on espnet toolkit boosted by conformer
In this study, we present recent developments on ESPnet: End-to-End Speech Processing
toolkit, which mainly involves a recently proposed architecture called Conformer …
toolkit, which mainly involves a recently proposed architecture called Conformer …
Ego4d: Around the world in 3,000 hours of egocentric video
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …
A general survey on attention mechanisms in deep learning
G Brauwers, F Frasincar - IEEE Transactions on Knowledge …, 2021 - ieeexplore.ieee.org
Attention is an important mechanism that can be employed for a variety of deep learning
models across many different domains and tasks. This survey provides an overview of the …
models across many different domains and tasks. This survey provides an overview of the …
Branchformer: Parallel mlp-attention architectures to capture local and global context for speech recognition and understanding
Conformer has proven to be effective in many speech processing tasks. It combines the
benefits of extracting local dependencies using convolutions and global dependencies …
benefits of extracting local dependencies using convolutions and global dependencies …
End-to-end speech recognition: A survey
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …
learning has brought considerable reductions in word error rate of more than 50% relative …
Wenet: Production oriented streaming and non-streaming end-to-end speech recognition toolkit
In this paper, we propose an open source, production first, and production ready speech
recognition toolkit called WeNet in which a new two-pass approach is implemented to unify …
recognition toolkit called WeNet in which a new two-pass approach is implemented to unify …
Attention, please! A survey of neural attention models in deep learning
A de Santana Correia, EL Colombini - Artificial Intelligence Review, 2022 - Springer
In humans, Attention is a core property of all perceptual and cognitive operations. Given our
limited ability to process competing sources, attention mechanisms select, modulate, and …
limited ability to process competing sources, attention mechanisms select, modulate, and …
Wenetspeech: A 10000+ hours multi-domain mandarin corpus for speech recognition
In this paper, we present WenetSpeech, a multi-domain Mandarin corpus consisting of
10000+ hours high-quality labeled speech, 2400+ hours weakly labeled speech, and about …
10000+ hours high-quality labeled speech, 2400+ hours weakly labeled speech, and about …
Streaming end-to-end speech recognition for mobile devices
End-to-end (E2E) models, which directly predict output character sequences given input
speech, are good candidates for on-device speech recognition. E2E models, however …
speech, are good candidates for on-device speech recognition. E2E models, however …