[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

Recent progress in transformer-based medical image analysis

Z Liu, Q Lv, Z Yang, Y Li, CH Lee, L Shen - Computers in Biology and …, 2023 - Elsevier
The transformer is primarily used in the field of natural language processing. Recently, it has
been adopted and shows promise in the computer vision (CV) field. Medical image analysis …

Google usm: Scaling automatic speech recognition beyond 100 languages

Y Zhang, W Han, J Qin, Y Wang, A Bapna… - arXiv preprint arXiv …, 2023 - arxiv.org
We introduce the Universal Speech Model (USM), a single large model that performs
automatic speech recognition (ASR) across 100+ languages. This is achieved by pre …

End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

Developing real-time streaming transformer transducer for speech recognition on large-scale dataset

X Chen, Y Wu, Z Wang, S Liu… - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
Recently, Transformer based end-to-end models have achieved great success in many
areas including speech recognition. However, compared to LSTM models, the heavy …

Transformers in speech processing: A survey

S Latif, A Zaidi, H Cuayahuitl, F Shamshad… - arXiv preprint arXiv …, 2023 - arxiv.org
The remarkable success of transformers in the field of natural language processing has
sparked the interest of the speech-processing community, leading to an exploration of their …

Parp: Prune, adjust and re-prune for self-supervised speech recognition

CIJ Lai, Y Zhang, AH Liu, S Chang… - Advances in …, 2021 - proceedings.neurips.cc
Self-supervised speech representation learning (speech SSL) has demonstrated the benefit
of scale in learning rich representations for Automatic Speech Recognition (ASR) with …

Contextualized streaming end-to-end speech recognition with trie-based deep biasing and shallow fusion

D Le, M Jain, G Keren, S Kim, Y Shi… - arXiv preprint arXiv …, 2021 - arxiv.org
How to leverage dynamic contextual information in end-to-end speech recognition has
remained an active research area. Previous solutions to this problem were either designed …

Xrbench: An extended reality (xr) machine learning benchmark suite for the metaverse

H Kwon, K Nair, J Seo, J Yik… - Proceedings of …, 2023 - proceedings.mlsys.org
Real-time multi-task multi-model (MTMM) workloads, a new form of deep learning inference
workloads, are emerging for applications areas like extended reality (XR) to support …

Understanding the role of self attention for efficient speech recognition

K Shim, J Choi, W Sung - International Conference on Learning …, 2022 - openreview.net
Self-attention (SA) is a critical component of Transformer neural networks that have
succeeded in automatic speech recognition (ASR). In this paper, we analyze the role of SA …