End-to-end multi-channel transformer for speech recognition

FJ Chang, M Radfar, A Mouchtaris… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
Transformers are powerful neural architectures that allow integrating different modalities
using attention mechanisms. In this paper, we leverage the neural transformer architectures …

Deep neural network-based generalized sidelobe canceller for dual-channel far-field speech recognition

G Li, S Liang, S Nie, W Liu, Z Yang - Neural Networks, 2021 - Elsevier
The traditional generalized sidelobe canceller (GSC) is a common speech enhancement
front end to improve the noise robustness of automatic speech recognition (ASR) systems in …

Multi-channel transformer transducer for speech recognition

FJ Chang, M Radfar, A Mouchtaris… - arXiv preprint arXiv …, 2021 - arxiv.org
Multi-channel inputs offer several advantages over single-channel, to improve the
robustness of on-device speech recognition systems. Recent work on multi-channel …

Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation

Q Zhu, J Zhang, Y Gu, Y Hu, L Dai - … of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org
Self-supervised speech pre-training methods have developed rapidly in recent years, which
show to be very effective for many near-field single-channel speech tasks. However, far-field …

Robust multi-channel speech recognition using frequency aligned network

T Park, K Kumatani, M Wu… - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org
Conventional speech enhancement technique such as beamforming has known benefits for
far-field speech recognition. Our own work in frequency-domain multi-channel acoustic …

Dual-encoder architecture with encoder selection for joint close-talk and far-talk speech recognition

F Weninger, M Gaudesi, R Leibold… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
In this paper, we propose a dual-encoder ASR architecture for joint modeling of close-talk
(CT) and far-talk (FT) speech, in order to combine the advantages of CT and FT devices for …

[PDF][PDF] Extract, Adapt and Recognize: An End-to-End Neural Network for Corrupted Monaural Speech Recognition.

MWY Lam, J Wang, X Liu, H Meng, D Su, D Yu - INTERSPEECH, 2019 - isca-archive.org
Automatic speech recognition (ASR) in challenging conditions, such as in the presence of
interfering speakers or music, remains an unsolved problem. This paper presents Extract …

[PDF][PDF] Multi-channel multi-speaker transformer for speech recognition

G Yifan, T Yao, S Hongbin, W Yulong - Proc. INTERSPEECH 2023, 2023 - isca-archive.org
With the development of teleconferencing and in-vehicle voice assistants, far-field multi-
speaker speech recognition has become a hot research topic. Recently, a multi-channel …

[PDF][PDF] Deep Neural Network-Based Generalized Sidelobe Canceller for Robust Multi-Channel Speech Recognition.

G Li, S Liang, S Nie, W Liu, Z Yang, L Xiao - INTERSPEECH, 2020 - isca-archive.org
The elastic spatial filter (ESF) proposed in recent years is a popular multi-channel speech
enhancement front end based on deep neural network (DNN). It is suitable for real-time …

Multi-channel Opus compression for far-field automatic speech recognition with a fixed bitrate budget

L Drude, J Heymann, A Schwarz, JM Valin - arXiv preprint arXiv …, 2021 - arxiv.org
Automatic speech recognition (ASR) in the cloud allows the use of larger models and more
powerful multi-channel signal processing front-ends compared to on-device processing …