End-to-end multi-channel transformer for speech recognition
Transformers are powerful neural architectures that allow integrating different modalities
using attention mechanisms. In this paper, we leverage the neural transformer architectures …
using attention mechanisms. In this paper, we leverage the neural transformer architectures …
Deep neural network-based generalized sidelobe canceller for dual-channel far-field speech recognition
The traditional generalized sidelobe canceller (GSC) is a common speech enhancement
front end to improve the noise robustness of automatic speech recognition (ASR) systems in …
front end to improve the noise robustness of automatic speech recognition (ASR) systems in …
Multi-channel transformer transducer for speech recognition
Multi-channel inputs offer several advantages over single-channel, to improve the
robustness of on-device speech recognition systems. Recent work on multi-channel …
robustness of on-device speech recognition systems. Recent work on multi-channel …
Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation
Self-supervised speech pre-training methods have developed rapidly in recent years, which
show to be very effective for many near-field single-channel speech tasks. However, far-field …
show to be very effective for many near-field single-channel speech tasks. However, far-field …
Robust multi-channel speech recognition using frequency aligned network
Conventional speech enhancement technique such as beamforming has known benefits for
far-field speech recognition. Our own work in frequency-domain multi-channel acoustic …
far-field speech recognition. Our own work in frequency-domain multi-channel acoustic …
Dual-encoder architecture with encoder selection for joint close-talk and far-talk speech recognition
F Weninger, M Gaudesi, R Leibold… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
In this paper, we propose a dual-encoder ASR architecture for joint modeling of close-talk
(CT) and far-talk (FT) speech, in order to combine the advantages of CT and FT devices for …
(CT) and far-talk (FT) speech, in order to combine the advantages of CT and FT devices for …
[PDF][PDF] Extract, Adapt and Recognize: An End-to-End Neural Network for Corrupted Monaural Speech Recognition.
Automatic speech recognition (ASR) in challenging conditions, such as in the presence of
interfering speakers or music, remains an unsolved problem. This paper presents Extract …
interfering speakers or music, remains an unsolved problem. This paper presents Extract …
[PDF][PDF] Multi-channel multi-speaker transformer for speech recognition
G Yifan, T Yao, S Hongbin, W Yulong - Proc. INTERSPEECH 2023, 2023 - isca-archive.org
With the development of teleconferencing and in-vehicle voice assistants, far-field multi-
speaker speech recognition has become a hot research topic. Recently, a multi-channel …
speaker speech recognition has become a hot research topic. Recently, a multi-channel …
[PDF][PDF] Deep Neural Network-Based Generalized Sidelobe Canceller for Robust Multi-Channel Speech Recognition.
The elastic spatial filter (ESF) proposed in recent years is a popular multi-channel speech
enhancement front end based on deep neural network (DNN). It is suitable for real-time …
enhancement front end based on deep neural network (DNN). It is suitable for real-time …
Multi-channel Opus compression for far-field automatic speech recognition with a fixed bitrate budget
Automatic speech recognition (ASR) in the cloud allows the use of larger models and more
powerful multi-channel signal processing front-ends compared to on-device processing …
powerful multi-channel signal processing front-ends compared to on-device processing …