[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

Motr: End-to-end multiple-object tracking with transformer

F Zeng, B Dong, Y Zhang, T Wang, X Zhang… - European Conference on …, 2022 - Springer
Temporal modeling of objects is a key challenge in multiple-object tracking (MOT). Existing
methods track by associating detections through motion-based and appearance-based …

Attention is all you need in speech separation

C Subakan, M Ravanelli, S Cornell… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
Recurrent Neural Networks (RNNs) have long been the dominant architecture in sequence-
to-sequence learning. RNNs, however, are inherently sequential models that do not allow …

Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

P Ochieng - Artificial Intelligence Review, 2023 - Springer
Deep neural networks (DNN) techniques have become pervasive in domains such as
natural language processing and computer vision. They have achieved great success in …

Continuous speech separation with conformer

S Chen, Y Wu, Z Chen, J Wu, J Li… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
Continuous speech separation was recently proposed to deal with the overlapped speech in
natural conversations. While it was shown to significantly improve the speech recognition …

Gated recurrent fusion with joint training framework for robust end-to-end speech recognition

C Fan, J Yi, J Tao, Z Tian, B Liu… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org
The joint training framework for speech enhancement and recognition methods have
obtained quite good performances for robust end-to-end automatic speech recognition …

Deep learning based multi-source localization with source splitting and its effectiveness in multi-talker speech recognition

AS Subramanian, C Weng, S Watanabe, M Yu… - Computer Speech & …, 2022 - Elsevier
Multi-source localization is an important and challenging technique for multi-talker
conversation analysis. This paper proposes a novel supervised learning method using deep …

ESPnet-SE: End-to-end speech enhancement and separation toolkit designed for ASR integration

C Li, J Shi, W Zhang, AS Subramanian… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
We present ESPnet-SE, which is designed for the quick development of speech
enhancement and speech separation systems in a single framework, along with the optional …

A study of transformer-based end-to-end speech recognition system for Kazakh language

M Orken, O Dina, A Keylan, T Tolganay, O Mohamed - Scientific reports, 2022 - nature.com
Today, the Transformer model, which allows parallelization and also has its own internal
attention, has been widely used in the field of speech recognition. The great advantage of …

Automatic lyrics transcription of polyphonic music with lyrics-chord multi-task learning

X Gao, C Gupta, H Li - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org
Lyrics are the words that make up a song, while chords are harmonic sets of multiple notes
in music. Lyrics and chords are generally essential information in music, ie unaccompanied …