[PDF][PDF] Recent advances in end-to-end automatic speech recognition
J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
MIMO-Speech: End-to-end multi-channel multi-speaker speech recognition
Recently, the end-to-end approach has proven its efficacy in monaural multi-speaker speech
recognition. However, high word error rates (WERs) still prevent these systems from being …
recognition. However, high word error rates (WERs) still prevent these systems from being …
Improving noise robustness of automatic speech recognition via parallel data and teacher-student learning
For real-world speech recognition applications, noise robustness is still a challenge. In this
work, we adopt the teacher-student (T/S) learning technique using a parallel clean and noisy …
work, we adopt the teacher-student (T/S) learning technique using a parallel clean and noisy …
End-to-end multi-channel transformer for speech recognition
Transformers are powerful neural architectures that allow integrating different modalities
using attention mechanisms. In this paper, we leverage the neural transformer architectures …
using attention mechanisms. In this paper, we leverage the neural transformer architectures …
Deep neural network-based generalized sidelobe canceller for dual-channel far-field speech recognition
The traditional generalized sidelobe canceller (GSC) is a common speech enhancement
front end to improve the noise robustness of automatic speech recognition (ASR) systems in …
front end to improve the noise robustness of automatic speech recognition (ASR) systems in …
Human listening and live captioning: Multi-task training for speech enhancement
With the surge of online meetings, it has become more critical than ever to provide high-
quality speech audio and live captioning under various noise conditions. However, most …
quality speech audio and live captioning under various noise conditions. However, most …
[PDF][PDF] GAN-Based Data Generation for Speech Emotion Recognition.
In this work, we propose a GAN-based method to generate synthetic data for speech
emotion recognition. Specifically, we investigate the usage of GANs for capturing the data …
emotion recognition. Specifically, we investigate the usage of GANs for capturing the data …
An end-to-end architecture of online multi-channel speech separation
Multi-speaker speech recognition has been one of the keychallenges in conversation
transcription as it breaks the singleactive speaker assumption employed by most state-of-the …
transcription as it breaks the singleactive speaker assumption employed by most state-of-the …
Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation
Self-supervised speech pre-training methods have developed rapidly in recent years, which
show to be very effective for many near-field single-channel speech tasks. However, far-field …
show to be very effective for many near-field single-channel speech tasks. However, far-field …
Self-attention channel combinator frontend for end-to-end multichannel far-field speech recognition
When a sufficiently large far-field training data is presented, jointly optimizing a multichannel
frontend and an end-to-end (E2E) Automatic Speech Recognition (ASR) backend shows …
frontend and an end-to-end (E2E) Automatic Speech Recognition (ASR) backend shows …