A survey on non-autoregressive generation for neural machine translation and beyond

Y Xiao, L Wu, J Guo, J Li, M Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Non-autoregressive (NAR) generation, which is first proposed in neural machine translation
(NMT) to speed up inference, has attracted much attention in both machine learning and …

M2MeT: The ICASSP 2022 multi-channel multi-party meeting transcription challenge

F Yu, S Zhang, Y Fu, L Xie, S Zheng… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Recent development of speech signal processing, such as speech recognition, speaker
diarization, etc., has inspired numerous applications of speech technologies. The meeting …

A sidecar separator can convert a single-talker speech recognition system to a multi-talker one

L Meng, J Kang, M Cui, Y Wang, X Wu… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Although automatic speech recognition (ASR) can perform well in common non-overlapping
environments, sustaining performance in multi-talker overlapping speech recognition …

Boundary and context aware training for cif-based non-autoregressive end-to-end asr

F Yu, H Luo, P Guo, Y Liang, Z Yao… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
Continuous integrate-and-fire (CIF) based models, which use a soft and monotonic
alignment mechanism, have been well applied in non-autoregressive (NAR) speech …

Ba-sot: Boundary-aware serialized output training for multi-talker asr

Y Liang, F Yu, Y Li, P Guo, S Zhang, Q Chen… - arXiv preprint arXiv …, 2023 - arxiv.org
The recently proposed serialized output training (SOT) simplifies multi-talker automatic
speech recognition (ASR) by generating speaker transcriptions separated by a special …

Minimum word error training for non-autoregressive transformer-based code-switching asr

Y Peng, J Zhang, H Xu, H Huang… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Non-autoregressive end-to-end ASR framework might be potentially appropriate for code-
switching recognition task thanks to its inherent property that present output token being …

Conformer-based target-speaker automatic speech recognition for single-channel audio

Y Zhang, KC Puvvada, V Lavrukhin… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
We propose CONF-TSASR, a non-autoregressive end-to-end time-frequency domain
architecture for single-channel target-speaker automatic speech recognition (TS-ASR). The …

Cocktail Hubert: Generalized Self-Supervised Pre-Training for Mixture and Single-Source Speech

M Fazel-Zarandi, WN Hsu - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Self-supervised learning leverages unlabeled data effectively, improving label efficiency and
generalization to domains without labeled data. While recent work has studied …

End-to-end Multichannel Speaker-Attributed ASR: Speaker Guided Decoder and Input Feature Analysis

C Cui, I Sheikh, M Sadeghi… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
We present an end-to-end multichannel speaker-attributed automatic speech recognition
(MC-SA-ASR) system that combines a Conformer-based encoder with multi-frame cross …

Sa-Paraformer: Non-Autoregressive End-To-End Speaker-Attributed ASR

Y Li, F Yu, Y Liang, P Guo, M Shi, Z Du… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Joint modeling of multi-speaker ASR and speaker diarization has recently shown promising
results in speaker-attributed automatic speech recognition (SA-ASR). Although being able to …