A survey on non-autoregressive generation for neural machine translation and beyond
Non-autoregressive (NAR) generation, which is first proposed in neural machine translation
(NMT) to speed up inference, has attracted much attention in both machine learning and …
(NMT) to speed up inference, has attracted much attention in both machine learning and …
M2MeT: The ICASSP 2022 multi-channel multi-party meeting transcription challenge
Recent development of speech signal processing, such as speech recognition, speaker
diarization, etc., has inspired numerous applications of speech technologies. The meeting …
diarization, etc., has inspired numerous applications of speech technologies. The meeting …
A sidecar separator can convert a single-talker speech recognition system to a multi-talker one
Although automatic speech recognition (ASR) can perform well in common non-overlapping
environments, sustaining performance in multi-talker overlapping speech recognition …
environments, sustaining performance in multi-talker overlapping speech recognition …
Boundary and context aware training for cif-based non-autoregressive end-to-end asr
Continuous integrate-and-fire (CIF) based models, which use a soft and monotonic
alignment mechanism, have been well applied in non-autoregressive (NAR) speech …
alignment mechanism, have been well applied in non-autoregressive (NAR) speech …
Ba-sot: Boundary-aware serialized output training for multi-talker asr
The recently proposed serialized output training (SOT) simplifies multi-talker automatic
speech recognition (ASR) by generating speaker transcriptions separated by a special …
speech recognition (ASR) by generating speaker transcriptions separated by a special …
Minimum word error training for non-autoregressive transformer-based code-switching asr
Non-autoregressive end-to-end ASR framework might be potentially appropriate for code-
switching recognition task thanks to its inherent property that present output token being …
switching recognition task thanks to its inherent property that present output token being …
Conformer-based target-speaker automatic speech recognition for single-channel audio
Y Zhang, KC Puvvada, V Lavrukhin… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
We propose CONF-TSASR, a non-autoregressive end-to-end time-frequency domain
architecture for single-channel target-speaker automatic speech recognition (TS-ASR). The …
architecture for single-channel target-speaker automatic speech recognition (TS-ASR). The …
Cocktail Hubert: Generalized Self-Supervised Pre-Training for Mixture and Single-Source Speech
M Fazel-Zarandi, WN Hsu - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Self-supervised learning leverages unlabeled data effectively, improving label efficiency and
generalization to domains without labeled data. While recent work has studied …
generalization to domains without labeled data. While recent work has studied …
End-to-end Multichannel Speaker-Attributed ASR: Speaker Guided Decoder and Input Feature Analysis
We present an end-to-end multichannel speaker-attributed automatic speech recognition
(MC-SA-ASR) system that combines a Conformer-based encoder with multi-frame cross …
(MC-SA-ASR) system that combines a Conformer-based encoder with multi-frame cross …
Sa-Paraformer: Non-Autoregressive End-To-End Speaker-Attributed ASR
Joint modeling of multi-speaker ASR and speaker diarization has recently shown promising
results in speaker-attributed automatic speech recognition (SA-ASR). Although being able to …
results in speaker-attributed automatic speech recognition (SA-ASR). Although being able to …