[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

VarArray meets t-SOT: Advancing the state of the art of streaming distant conversational speech recognition

N Kanda, J Wu, X Wang, Z Chen, J Li… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
This paper presents a novel streaming automatic speech recognition (ASR) framework for
multi-talker overlapping speech captured by a distant microphone array with an arbitrary …

Serialized output training for end-to-end overlapped speech recognition

N Kanda, Y Gaur, X Wang, Z Meng… - arXiv preprint arXiv …, 2020 - arxiv.org
This paper proposes serialized output training (SOT), a novel framework for multi-speaker
overlapped speech recognition based on an attention-based encoder-decoder approach …

Streaming multi-talker ASR with token-level serialized output training

N Kanda, J Wu, Y Wu, X Xiao, Z Meng, X Wang… - arXiv preprint arXiv …, 2022 - arxiv.org
This paper proposes a token-level serialized output training (t-SOT), a novel framework for
streaming multi-talker automatic speech recognition (ASR). Unlike existing streaming multi …

A conformer-based asr frontend for joint acoustic echo cancellation, speech enhancement and speech separation

T O'Malley, A Narayanan, Q Wang… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
We present a frontend for improving robustness of automatic speech recognition (ASR), that
jointly implements three modules within a single model: acoustic echo cancellation, speech …

Large-scale pre-training of end-to-end multi-talker ASR for meeting transcription with single distant microphone

N Kanda, G Ye, Y Wu, Y Gaur, X Wang, Z Meng… - arXiv preprint arXiv …, 2021 - arxiv.org
Transcribing meetings containing overlapped speech with only a single distant microphone
(SDM) has been one of the most challenging problems for automatic speech recognition …

Streaming end-to-end multi-talker speech recognition

L Lu, N Kanda, J Li, Y Gong - IEEE Signal Processing Letters, 2021 - ieeexplore.ieee.org
End-to-end multi-talker speech recognition is an emerging research trend in the speech
community due to its vast potential in applications such as conversation and meeting …

Streaming multi-speaker ASR with RNN-T

I Sklyar, A Piunova, Y Liu - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
Recent research shows end-to-end ASR systems can recognize overlapped speech from
multiple speakers. However, all published works have assumed no latency constraints …

Multi-turn RNN-T for streaming recognition of multi-party speech

I Sklyar, A Piunova, X Zheng… - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Automatic speech recognition (ASR) of single channel far-field recordings with an unknown
number of speakers is traditionally tackled by cascaded modules. Recent research shows …

A sidecar separator can convert a single-talker speech recognition system to a multi-talker one

L Meng, J Kang, M Cui, Y Wang, X Wu… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Although automatic speech recognition (ASR) can perform well in common non-overlapping
environments, sustaining performance in multi-talker overlapping speech recognition …