[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

Enabling resource-efficient aiot system with cross-level optimization: A survey

S Liu, B Guo, C Fang, Z Wang, S Luo… - … Surveys & Tutorials, 2023 - ieeexplore.ieee.org
The emerging field of artificial intelligence of things (AIoT, AI+ IoT) is driven by the
widespread use of intelligent infrastructures and the impressive success of deep learning …

Joist: A joint speech and text streaming model for asr

TN Sainath, R Prabhavalkar, A Bapna… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
We present JOIST, an algorithm to train a streaming, cascaded, encoder end-to-end (E2E)
model with both speech-text paired inputs, and text-only unpaired inputs. Unlike previous …

Tied & reduced rnn-t decoder

R Botros, TN Sainath, R David, E Guzman, W Li… - arXiv preprint arXiv …, 2021 - arxiv.org
Previous works on the Recurrent Neural Network-Transducer (RNN-T) models have shown
that, under some conditions, it is possible to simplify its prediction network with little or no …

Electrical energy prediction in residential buildings for short-term horizons using hybrid deep learning strategy

ZA Khan, A Ullah, W Ullah, S Rho, M Lee, SW Baik - Applied Sciences, 2020 - mdpi.com
Smart grid technology based on renewable energy and energy storage systems are
attracting considerable attention towards energy crises. Accurate and reliable model for …

Wav2vec-c: A self-supervised model for speech representation learning

S Sadhu, D He, CW Huang, SH Mallidi, M Wu… - arXiv preprint arXiv …, 2021 - arxiv.org
Wav2vec-C introduces a novel representation learning technique combining elements from
wav2vec 2.0 and VQ-VAE. Our model learns to reproduce quantized representations from …

ASRTest: automated testing for deep-neural-network-driven speech recognition systems

P Ji, Y Feng, J Liu, Z Zhao, Z Chen - Proceedings of the 31st ACM …, 2022 - dl.acm.org
With the rapid development of deep neural networks and end-to-end learning techniques,
automatic speech recognition (ASR) systems have been deployed into our daily and assist …

Personalization strategies for end-to-end speech recognition systems

A Gourav, L Liu, A Gandhe, Y Gu, G Lan… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
The recognition of personalized content, such as contact names, remains a challenging
problem for end-to-end speech recognition systems. In this work, we demonstrate how first …

Less is more: Improved rnn-t decoding using limited label context and path merging

R Prabhavalkar, Y He, D Rybach… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
End-to-end models that condition the output sequence on all previously predicted labels
have emerged as popular alternatives to conventional systems for automatic speech …

Efficient training of neural transducer for speech recognition

W Zhou, W Michel, R Schlüter, H Ney - arXiv preprint arXiv:2204.10586, 2022 - arxiv.org
As one of the most popular sequence-to-sequence modeling approaches for speech
recognition, the RNN-Transducer has achieved evolving performance with more and more …