[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

Enabling resource-efficient aiot system with cross-level optimization: A survey

S Liu, B Guo, C Fang, Z Wang, S Luo… - … Surveys & Tutorials, 2023 - ieeexplore.ieee.org
The emerging field of artificial intelligence of things (AIoT, AI+ IoT) is driven by the
widespread use of intelligent infrastructures and the impressive success of deep learning …

Structured pruning of large language models

Z Wang, J Wohlwend, T Lei - arXiv preprint arXiv:1910.04732, 2019 - arxiv.org
Large language models have recently achieved state of the art performance across a wide
variety of natural language tasks. Meanwhile, the size of these models and their latency …

VoiceFilter-Lite: Streaming targeted voice separation for on-device speech recognition

Q Wang, IL Moreno, M Saglam, K Wilson… - arXiv preprint arXiv …, 2020 - arxiv.org
We introduce VoiceFilter-Lite, a single-channel source separation model that runs on the
device to preserve only the speech signals from a target user, as part of a streaming speech …

Parp: Prune, adjust and re-prune for self-supervised speech recognition

CIJ Lai, Y Zhang, AH Liu, S Chang… - Advances in …, 2021 - proceedings.neurips.cc
Self-supervised speech representation learning (speech SSL) has demonstrated the benefit
of scale in learning rich representations for Automatic Speech Recognition (ASR) with …

Coarsening the granularity: Towards structurally sparse lottery tickets

T Chen, X Chen, X Ma, Y Wang… - … conference on machine …, 2022 - proceedings.mlr.press
The lottery ticket hypothesis (LTH) has shown that dense models contain highly sparse
subnetworks (ie, winning tickets) that can be trained in isolation to match full accuracy …

When attention meets fast recurrence: Training language models with reduced compute

T Lei - arXiv preprint arXiv:2102.12459, 2021 - arxiv.org
Large language models have become increasingly difficult to train because of the growing
computation time and cost. In this work, we present SRU++, a highly-efficient architecture …

Alignment restricted streaming recurrent neural network transducer

J Mahadeokar, Y Shangguan, D Le… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
There is a growing interest in the speech community in developing Recurrent Neural
Network Transducer (RNN-T) models for automatic speech recognition (ASR) applications …

Automatic speech recognition using limited vocabulary: A survey

JLKE Fendji, DCM Tala, BO Yenke… - Applied Artificial …, 2022 - Taylor & Francis
ABSTRACT Automatic Speech Recognition (ASR) is an active field of research due to its
large number of applications and the proliferation of interfaces or computing devices that …

CHIMERA: A 0.92-TOPS, 2.2-TOPS/W edge AI accelerator with 2-MByte on-chip foundry resistive RAM for efficient training and inference

K Prabhu, A Gural, ZF Khan… - IEEE Journal of Solid …, 2022 - ieeexplore.ieee.org
Implementing edge artificial intelligence (AI) inference and training is challenging with
current memory technologies. As deep neural networks (DNNs) grow in size, this problem is …