The Second Multi-Channel Multi-Party Meeting Transcription Challenge (M2MeT 2.0): A Benchmark for Speaker-Attributed ASR
With the success of the first Multi-channel Multi-party Meeting Transcription challenge
(M2MeT), the second M2MeT challenge (M2MeT 2.0) held in ASRU2023 particularly aims to …
(M2MeT), the second M2MeT challenge (M2MeT 2.0) held in ASRU2023 particularly aims to …
Ba-sot: Boundary-aware serialized output training for multi-talker asr
The recently proposed serialized output training (SOT) simplifies multi-talker automatic
speech recognition (ASR) by generating speaker transcriptions separated by a special …
speech recognition (ASR) by generating speaker transcriptions separated by a special …
Automatic channel selection and spatial feature integration for multi-channel speech recognition across various array topologies
Automatic Speech Recognition (ASR) has shown remarkable progress, yet it still faces
challenges in real-world distant scenarios across various array topologies each with multiple …
challenges in real-world distant scenarios across various array topologies each with multiple …
Robust Wake Word Spotting With Frame-Level Cross-Modal Attention Based Audio-Visual Conformer
In recent years, neural network-based Wake Word Spotting achieves good performance on
clean audio samples but struggles in noisy environments. Audio-Visual Wake Word Spotting …
clean audio samples but struggles in noisy environments. Audio-Visual Wake Word Spotting …
End-to-end Multichannel Speaker-Attributed ASR: Speaker Guided Decoder and Input Feature Analysis
We present an end-to-end multichannel speaker-attributed automatic speech recognition
(MC-SA-ASR) system that combines a Conformer-based encoder with multi-frame cross …
(MC-SA-ASR) system that combines a Conformer-based encoder with multi-frame cross …
A comparative study on multichannel speaker-attributed automatic speech recognition in multi-party meetings
Speaker-attributed automatic speech recognition (SA-ASR) in multi-party meeting scenarios
is one of the most valuable and challenging ASR tasks. It was shown that SingleChannel …
is one of the most valuable and challenging ASR tasks. It was shown that SingleChannel …
Sa-Paraformer: Non-Autoregressive End-To-End Speaker-Attributed ASR
Joint modeling of multi-speaker ASR and speaker diarization has recently shown promising
results in speaker-attributed automatic speech recognition (SA-ASR). Although being able to …
results in speaker-attributed automatic speech recognition (SA-ASR). Although being able to …
A Streaming Multi-Channel End-to-End Speech Recognition System with Realistic Evaluations
X Kong, T Ning, H Huang, Z Ou - arXiv preprint arXiv:2407.09807, 2024 - arxiv.org
Recently multi-channel end-to-end (ME2E) ASR systems have emerged. While streaming
single-channel end-to-end ASR has been extensively studied, streaming ME2E ASR is …
single-channel end-to-end ASR has been extensively studied, streaming ME2E ASR is …
[PDF][PDF] The NPU System for DASR Task of CHiME-7 Challenge
This study describes the NPU system for the Distant Automatic Speech Recognition (DASR)
task of the CHiME-7 Challenge. Specifically, two attention-based channel selection modules …
task of the CHiME-7 Challenge. Specifically, two attention-based channel selection modules …
Multi-Frame Cross-Channel Attention and Speaker Diarization Based Speaker-Attributed Automatic Speech Recognition System for Multi-Channel Multi-Party Meeting …
L Xu, H Yan, M He, Z Guo, Y Zhou, P Liu… - Journal of Shanghai …, 2024 - Springer
This paper describes a speaker-attributed automatic speech recognition (SA-ASR) system
submitted to the multi-channel multi-party meeting transcription challenge, which aims to …
submitted to the multi-channel multi-party meeting transcription challenge, which aims to …