MFCCA: Multi-Frame Cross-Channel attention for multi-speaker ASR in Multi-party meeting scenario

The Second Multi-Channel Multi-Party Meeting Transcription Challenge (M2MeT 2.0): A Benchmark for Speaker-Attributed ASR

Y Liang, M Shi, F Yu, Y Li, S Zhang, Z Du… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

With the success of the first Multi-channel Multi-party Meeting Transcription challenge
(M2MeT), the second M2MeT challenge (M2MeT 2.0) held in ASRU2023 particularly aims to …

被引用次数：2 相关文章所有 4 个版本

[PDF] arxiv.org

Ba-sot: Boundary-aware serialized output training for multi-talker asr

Y Liang, F Yu, Y Li, P Guo, S Zhang, Q Chen… - arXiv preprint arXiv …, 2023 - arxiv.org

The recently proposed serialized output training (SOT) simplifies multi-talker automatic
speech recognition (ASR) by generating speaker transcriptions separated by a special …

被引用次数：5 相关文章所有 5 个版本

[PDF] arxiv.org

Automatic channel selection and spatial feature integration for multi-channel speech recognition across various array topologies

B Mu, P Guo, D Guo, P Zhou… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

Automatic Speech Recognition (ASR) has shown remarkable progress, yet it still faces
challenges in real-world distant scenarios across various array topologies each with multiple …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Robust Wake Word Spotting With Frame-Level Cross-Modal Attention Based Audio-Visual Conformer

H Wang, M Cheng, Q Fu, M Li - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org

In recent years, neural network-based Wake Word Spotting achieves good performance on
clean audio samples but struggles in noisy environments. Audio-Visual Wake Word Spotting …

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

End-to-end Multichannel Speaker-Attributed ASR: Speaker Guided Decoder and Input Feature Analysis

C Cui, I Sheikh, M Sadeghi… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

We present an end-to-end multichannel speaker-attributed automatic speech recognition
(MC-SA-ASR) system that combines a Conformer-based encoder with multi-frame cross …

被引用次数：1 相关文章所有 10 个版本

[PDF] arxiv.org

A comparative study on multichannel speaker-attributed automatic speech recognition in multi-party meetings

M Shi, J Zhang, Z Du, F Yu, Q Chen… - 2023 Asia Pacific …, 2023 - ieeexplore.ieee.org

Speaker-attributed automatic speech recognition (SA-ASR) in multi-party meeting scenarios
is one of the most valuable and challenging ASR tasks. It was shown that SingleChannel …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

Sa-Paraformer: Non-Autoregressive End-To-End Speaker-Attributed ASR

Y Li, F Yu, Y Liang, P Guo, M Shi, Z Du… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

Joint modeling of multi-speaker ASR and speaker diarization has recently shown promising
results in speaker-attributed automatic speech recognition (SA-ASR). Although being able to …

A Streaming Multi-Channel End-to-End Speech Recognition System with Realistic Evaluations

X Kong, T Ning, H Huang, Z Ou - arXiv preprint arXiv:2407.09807, 2024 - arxiv.org

Recently multi-channel end-to-end (ME2E) ASR systems have emerged. While streaming
single-channel end-to-end ASR has been extensively studied, streaming ME2E ASR is …

[PDF][PDF] The NPU System for DASR Task of CHiME-7 Challenge

B Mu, P Guo, H Wang, Y Li, Y Li, P Zhou… - Proc. CHiME …, 2023 - chimechallenge.org

This study describes the NPU system for the Distant Automatic Speech Recognition (DASR)
task of the CHiME-7 Challenge. Specifically, two attention-based channel selection modules …

Multi-Frame Cross-Channel Attention and Speaker Diarization Based Speaker-Attributed Automatic Speech Recognition System for Multi-Channel Multi-Party Meeting …

L Xu, H Yan, M He, Z Guo, Y Zhou, P Liu… - Journal of Shanghai …, 2024 - Springer

This paper describes a speaker-attributed automatic speech recognition (SA-ASR) system
submitted to the multi-channel multi-party meeting transcription challenge, which aims to …