Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation

Q Zhu, J Zhang, Y Gu, Y Hu, L Dai - … of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org
Self-supervised speech pre-training methods have developed rapidly in recent years, which
show to be very effective for many near-field single-channel speech tasks. However, far-field …

Self-attention channel combinator frontend for end-to-end multichannel far-field speech recognition

R Gong, C Quillen, D Sharma, A Goderre… - arXiv preprint arXiv …, 2021 - arxiv.org
When a sufficiently large far-field training data is presented, jointly optimizing a multichannel
frontend and an end-to-end (E2E) Automatic Speech Recognition (ASR) backend shows …

Channel-combination algorithms for robust distant voice activity and overlapped speech detection

T Mariotte, A Larcher, S Montrésor… - … /ACM Transactions on …, 2024 - ieeexplore.ieee.org
Voice Activity Detection (VAD) and Overlapped Speech Detection (OSD) are key pre-
processing tasks for speaker diarization. In the meeting context, it is often easier to capture …

Microphone array channel combination algorithms for overlapped speech detection

T Mariotte, A Larcher, S Montrésor… - … 2022 Human and …, 2022 - univ-lemans.hal.science
Overlapped speech occurs when multiple speakers are simultaneously active. This may
lead to severe performance degradation in automatic speech processing systems such as …

Far-field speech recognition based on complex-valued neural networks and inter-frame similarity difference method

Y Guo, Y Chen, G Cheng, P Zhang… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
Far-field automatic speech recognition (ASR) is a challenging task due to the background
noise and reverberation. To address this issue, we introduce a novel end-to-end multi …

[PDF][PDF] Multi-channel multi-speaker transformer for speech recognition

G Yifan, T Yao, S Hongbin, W Yulong - Proc. INTERSPEECH 2023, 2023 - isca-archive.org
With the development of teleconferencing and in-vehicle voice assistants, far-field multi-
speaker speech recognition has become a hot research topic. Recently, a multi-channel …

Traitement automatique de la parole en réunion par dissémination de capteurs

T Mariotte - 2024 - theses.hal.science
Ces travaux de thèse se concentrent sur le traitement automatique de la parole, et plus
particulièrement sur la diarisation en locuteurs. Cette tâche nécessite de segmenter le …

ChannelAugment: Improving generalization of multi-channel ASR by training with input channel randomization

M Gaudesi, F Weninger, D Sharma… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
End-to-end (E2E) multi-channel ASR systems show state-of-the-art performance in far-field
ASR tasks by joint training of a multi-channel front-end along with the ASR model. The main …