The Second Multi-Channel Multi-Party Meeting Transcription Challenge (M2MeT 2.0): A Benchmark for Speaker-Attributed ASR

Y Liang, M Shi, F Yu, Y Li, S Zhang, Z Du… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
With the success of the first Multi-channel Multi-party Meeting Transcription challenge
(M2MeT), the second M2MeT challenge (M2MeT 2.0) held in ASRU2023 particularly aims to …

Ba-sot: Boundary-aware serialized output training for multi-talker asr

Y Liang, F Yu, Y Li, P Guo, S Zhang, Q Chen… - arXiv preprint arXiv …, 2023 - arxiv.org
The recently proposed serialized output training (SOT) simplifies multi-talker automatic
speech recognition (ASR) by generating speaker transcriptions separated by a special …

Automatic channel selection and spatial feature integration for multi-channel speech recognition across various array topologies

B Mu, P Guo, D Guo, P Zhou… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Automatic Speech Recognition (ASR) has shown remarkable progress, yet it still faces
challenges in real-world distant scenarios across various array topologies each with multiple …

Robust Wake Word Spotting With Frame-Level Cross-Modal Attention Based Audio-Visual Conformer

H Wang, M Cheng, Q Fu, M Li - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
In recent years, neural network-based Wake Word Spotting achieves good performance on
clean audio samples but struggles in noisy environments. Audio-Visual Wake Word Spotting …

End-to-end Multichannel Speaker-Attributed ASR: Speaker Guided Decoder and Input Feature Analysis

C Cui, I Sheikh, M Sadeghi… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
We present an end-to-end multichannel speaker-attributed automatic speech recognition
(MC-SA-ASR) system that combines a Conformer-based encoder with multi-frame cross …

A comparative study on multichannel speaker-attributed automatic speech recognition in multi-party meetings

M Shi, J Zhang, Z Du, F Yu, Q Chen… - 2023 Asia Pacific …, 2023 - ieeexplore.ieee.org
Speaker-attributed automatic speech recognition (SA-ASR) in multi-party meeting scenarios
is one of the most valuable and challenging ASR tasks. It was shown that SingleChannel …

Sa-Paraformer: Non-Autoregressive End-To-End Speaker-Attributed ASR

Y Li, F Yu, Y Liang, P Guo, M Shi, Z Du… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Joint modeling of multi-speaker ASR and speaker diarization has recently shown promising
results in speaker-attributed automatic speech recognition (SA-ASR). Although being able to …

A Streaming Multi-Channel End-to-End Speech Recognition System with Realistic Evaluations

X Kong, T Ning, H Huang, Z Ou - arXiv preprint arXiv:2407.09807, 2024 - arxiv.org
Recently multi-channel end-to-end (ME2E) ASR systems have emerged. While streaming
single-channel end-to-end ASR has been extensively studied, streaming ME2E ASR is …

[PDF][PDF] The NPU System for DASR Task of CHiME-7 Challenge

B Mu, P Guo, H Wang, Y Li, Y Li, P Zhou… - Proc. CHiME …, 2023 - chimechallenge.org
This study describes the NPU system for the Distant Automatic Speech Recognition (DASR)
task of the CHiME-7 Challenge. Specifically, two attention-based channel selection modules …

Multi-Frame Cross-Channel Attention and Speaker Diarization Based Speaker-Attributed Automatic Speech Recognition System for Multi-Channel Multi-Party Meeting …

L Xu, H Yan, M He, Z Guo, Y Zhou, P Liu… - Journal of Shanghai …, 2024 - Springer
This paper describes a speaker-attributed automatic speech recognition (SA-ASR) system
submitted to the multi-channel multi-party meeting transcription challenge, which aims to …