[PDF][PDF] Recent advances in end-to-end automatic speech recognition
J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
Motr: End-to-end multiple-object tracking with transformer
Temporal modeling of objects is a key challenge in multiple-object tracking (MOT). Existing
methods track by associating detections through motion-based and appearance-based …
methods track by associating detections through motion-based and appearance-based …
Attention is all you need in speech separation
Recurrent Neural Networks (RNNs) have long been the dominant architecture in sequence-
to-sequence learning. RNNs, however, are inherently sequential models that do not allow …
to-sequence learning. RNNs, however, are inherently sequential models that do not allow …
Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis
P Ochieng - Artificial Intelligence Review, 2023 - Springer
Deep neural networks (DNN) techniques have become pervasive in domains such as
natural language processing and computer vision. They have achieved great success in …
natural language processing and computer vision. They have achieved great success in …
Continuous speech separation with conformer
Continuous speech separation was recently proposed to deal with the overlapped speech in
natural conversations. While it was shown to significantly improve the speech recognition …
natural conversations. While it was shown to significantly improve the speech recognition …
Gated recurrent fusion with joint training framework for robust end-to-end speech recognition
The joint training framework for speech enhancement and recognition methods have
obtained quite good performances for robust end-to-end automatic speech recognition …
obtained quite good performances for robust end-to-end automatic speech recognition …
Deep learning based multi-source localization with source splitting and its effectiveness in multi-talker speech recognition
Multi-source localization is an important and challenging technique for multi-talker
conversation analysis. This paper proposes a novel supervised learning method using deep …
conversation analysis. This paper proposes a novel supervised learning method using deep …
ESPnet-SE: End-to-end speech enhancement and separation toolkit designed for ASR integration
We present ESPnet-SE, which is designed for the quick development of speech
enhancement and speech separation systems in a single framework, along with the optional …
enhancement and speech separation systems in a single framework, along with the optional …
A study of transformer-based end-to-end speech recognition system for Kazakh language
Today, the Transformer model, which allows parallelization and also has its own internal
attention, has been widely used in the field of speech recognition. The great advantage of …
attention, has been widely used in the field of speech recognition. The great advantage of …
Automatic lyrics transcription of polyphonic music with lyrics-chord multi-task learning
Lyrics are the words that make up a song, while chords are harmonic sets of multiple notes
in music. Lyrics and chords are generally essential information in music, ie unaccompanied …
in music. Lyrics and chords are generally essential information in music, ie unaccompanied …