Frequency domain multi-channel acoustic modeling for distant speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com

Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

被引用次数：326 相关文章所有 7 个版本

[PDF] arxiv.org

MIMO-Speech: End-to-end multi-channel multi-speaker speech recognition

X Chang, W Zhang, Y Qian, J Le Roux… - 2019 IEEE Automatic …, 2019 - ieeexplore.ieee.org

Recently, the end-to-end approach has proven its efficacy in monaural multi-speaker speech
recognition. However, high word error rates (WERs) still prevent these systems from being …

被引用次数：116 相关文章所有 10 个版本

[PDF] arxiv.org

Improving noise robustness of automatic speech recognition via parallel data and teacher-student learning

L Mošner, M Wu, A Raju… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org

For real-world speech recognition applications, noise robustness is still a challenge. In this
work, we adopt the teacher-student (T/S) learning technique using a parallel clean and noisy …

被引用次数：72 相关文章所有 10 个版本

[PDF] arxiv.org

End-to-end multi-channel transformer for speech recognition

FJ Chang, M Radfar, A Mouchtaris… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

Transformers are powerful neural architectures that allow integrating different modalities
using attention mechanisms. In this paper, we leverage the neural transformer architectures …

被引用次数：32 相关文章所有 6 个版本

Deep neural network-based generalized sidelobe canceller for dual-channel far-field speech recognition

G Li, S Liang, S Nie, W Liu, Z Yang - Neural Networks, 2021 - Elsevier

The traditional generalized sidelobe canceller (GSC) is a common speech enhancement
front end to improve the noise robustness of automatic speech recognition (ASR) systems in …

被引用次数：27 相关文章所有 5 个版本

[PDF] arxiv.org

Human listening and live captioning: Multi-task training for speech enhancement

SE Eskimez, X Wang, M Tang, H Yang, Z Zhu… - arXiv preprint arXiv …, 2021 - arxiv.org

With the surge of online meetings, it has become more critical than ever to provide high-
quality speech audio and live captioning under various noise conditions. However, most …

被引用次数：26 相关文章所有 5 个版本

[PDF] interspeech2020.org

[PDF][PDF] GAN-Based Data Generation for Speech Emotion Recognition.

SE Eskimez, D Dimitriadis, R Gmyr… - …, 2020 - interspeech2020.org

In this work, we propose a GAN-based method to generate synthetic data for speech
emotion recognition. Specifically, we investigate the usage of GANs for capturing the data …

被引用次数：26 相关文章所有 4 个版本

[PDF] arxiv.org

An end-to-end architecture of online multi-channel speech separation

J Wu, Z Chen, J Li, T Yoshioka, Z Tan, E Lin… - arXiv preprint arXiv …, 2020 - arxiv.org

Multi-speaker speech recognition has been one of the keychallenges in conversation
transcription as it breaks the singleactive speaker assumption employed by most state-of-the …

被引用次数：24 相关文章所有 3 个版本

[PDF] aaai.org

Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation

Q Zhu, J Zhang, Y Gu, Y Hu, L Dai - … of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org

Self-supervised speech pre-training methods have developed rapidly in recent years, which
show to be very effective for many near-field single-channel speech tasks. However, far-field …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Self-attention channel combinator frontend for end-to-end multichannel far-field speech recognition

R Gong, C Quillen, D Sharma, A Goderre… - arXiv preprint arXiv …, 2021 - arxiv.org

When a sufficiently large far-field training data is presented, jointly optimizing a multichannel
frontend and an end-to-end (E2E) Automatic Speech Recognition (ASR) backend shows …

被引用次数：15 相关文章所有 8 个版本