End-to-end multi-speaker speech recognition with transformer

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com

Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

被引用次数：405 相关文章所有 7 个版本

[PDF] arxiv.org

Motr: End-to-end multiple-object tracking with transformer

F Zeng, B Dong, Y Zhang, T Wang, X Zhang… - European Conference on …, 2022 - Springer

Temporal modeling of objects is a key challenge in multiple-object tracking (MOT). Existing
methods track by associating detections through motion-based and appearance-based …

被引用次数：587 相关文章所有 7 个版本

[PDF] arxiv.org

Attention is all you need in speech separation

C Subakan, M Ravanelli, S Cornell… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

Recurrent Neural Networks (RNNs) have long been the dominant architecture in sequence-
to-sequence learning. RNNs, however, are inherently sequential models that do not allow …

被引用次数：615 相关文章所有 7 个版本

[PDF] springer.com

Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

P Ochieng - Artificial Intelligence Review, 2023 - Springer

Deep neural networks (DNN) techniques have become pervasive in domains such as
natural language processing and computer vision. They have achieved great success in …

被引用次数：23 相关文章所有 8 个版本

[PDF] arxiv.org

Continuous speech separation with conformer

S Chen, Y Wu, Z Chen, J Wu, J Li… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

Continuous speech separation was recently proposed to deal with the overlapped speech in
natural conversations. While it was shown to significantly improve the speech recognition …

被引用次数：144 相关文章所有 5 个版本

[PDF] arxiv.org

Gated recurrent fusion with joint training framework for robust end-to-end speech recognition

C Fan, J Yi, J Tao, Z Tian, B Liu… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org

The joint training framework for speech enhancement and recognition methods have
obtained quite good performances for robust end-to-end automatic speech recognition …

被引用次数：90 相关文章所有 6 个版本

[PDF] arxiv.org

Deep learning based multi-source localization with source splitting and its effectiveness in multi-talker speech recognition

AS Subramanian, C Weng, S Watanabe, M Yu… - Computer Speech & …, 2022 - Elsevier

Multi-source localization is an important and challenging technique for multi-talker
conversation analysis. This paper proposes a novel supervised learning method using deep …

被引用次数：79 相关文章所有 5 个版本

[PDF] arxiv.org

ESPnet-SE: End-to-end speech enhancement and separation toolkit designed for ASR integration

C Li, J Shi, W Zhang, AS Subramanian… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org

We present ESPnet-SE, which is designed for the quick development of speech
enhancement and speech separation systems in a single framework, along with the optional …

被引用次数：92 相关文章所有 5 个版本

[PDF] nature.com

A study of transformer-based end-to-end speech recognition system for Kazakh language

M Orken, O Dina, A Keylan, T Tolganay, O Mohamed - Scientific reports, 2022 - nature.com

Today, the Transformer model, which allows parallelization and also has its own internal
attention, has been widely used in the field of speech recognition. The great advantage of …

被引用次数：43 相关文章所有 7 个版本

[PDF] ieee.org

Automatic lyrics transcription of polyphonic music with lyrics-chord multi-task learning

X Gao, C Gupta, H Li - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org

Lyrics are the words that make up a song, while chords are harmonic sets of multiple notes
in music. Lyrics and chords are generally essential information in music, ie unaccompanied …

被引用次数：32 相关文章所有 3 个版本