[PDF][PDF] Recent advances in end-to-end automatic speech recognition
J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
A review of speaker diarization: Recent advances with deep learning
Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …
Recent progresses in deep learning based acoustic models
In this paper, we summarize recent progresses made in deep learning based acoustic
models and the motivation and insights behind the surveyed techniques. We first discuss …
models and the motivation and insights behind the surveyed techniques. We first discuss …
Serialized output training for end-to-end overlapped speech recognition
This paper proposes serialized output training (SOT), a novel framework for multi-speaker
overlapped speech recognition based on an attention-based encoder-decoder approach …
overlapped speech recognition based on an attention-based encoder-decoder approach …
Streaming multi-talker ASR with token-level serialized output training
This paper proposes a token-level serialized output training (t-SOT), a novel framework for
streaming multi-talker automatic speech recognition (ASR). Unlike existing streaming multi …
streaming multi-talker automatic speech recognition (ASR). Unlike existing streaming multi …
End-to-end multi-speaker speech recognition with transformer
Recently, fully recurrent neural network (RNN) based end-to-end models have been proven
to be effective for multi-speaker speech recognition in both the single-channel and multi …
to be effective for multi-speaker speech recognition in both the single-channel and multi …
Joint speaker counting, speech recognition, and speaker identification for overlapped speech of any number of speakers
We propose an end-to-end speaker-attributed automatic speech recognition model that
unifies speaker counting, speech recognition, and speaker identification on monaural …
unifies speaker counting, speech recognition, and speaker identification on monaural …
Deep extractor network for target speaker recovery from single channel speech mixtures
Speaker-aware source separation methods are promising workarounds for major difficulties
such as arbitrary source permutation and unknown number of sources. However, it remains …
such as arbitrary source permutation and unknown number of sources. However, it remains …
Automatic lyrics transcription of polyphonic music with lyrics-chord multi-task learning
Lyrics are the words that make up a song, while chords are harmonic sets of multiple notes
in music. Lyrics and chords are generally essential information in music, ie unaccompanied …
in music. Lyrics and chords are generally essential information in music, ie unaccompanied …
Past review, current progress, and challenges ahead on the cocktail party problem
The cocktail party problem, ie, tracing and recognizing the speech of a specific speaker
when multiple speakers talk simultaneously, is one of the critical problems yet to be solved …
when multiple speakers talk simultaneously, is one of the critical problems yet to be solved …