Deep contextualized acoustic representations for semi-supervised speech recognition

S Ling, Y Liu, J Salazar… - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org
We propose a novel approach to semi-supervised automatic speech recognition (ASR). We
first exploit a large amount of unlabeled audio data via representation learning, where we …

Magic dust for cross-lingual adaptation of monolingual wav2vec-2.0

S Khurana, A Laurent, J Glass - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
We propose a simple and effective cross-lingual transfer learning method to adapt
monolingual wav2vec-2.0 models for Automatic Speech Recognition (ASR) in resource …

Semi-supervised speech recognition via graph-based temporal classification

N Moritz, T Hori, J Le Roux - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
Semi-supervised learning has demonstrated promising results in automatic speech
recognition (ASR) by self-training using a seed ASR model with pseudo-labels generated for …

Semi-supervised learning with data augmentation for end-to-end ASR

F Weninger, F Mana, R Gemello… - arXiv preprint arXiv …, 2020 - arxiv.org
In this paper, we apply Semi-Supervised Learning (SSL) along with Data Augmentation (DA)
for improving the accuracy of End-to-End ASR. We focus on the consistency regularization …

A semi-supervised complementary joint training approach for low-resource speech recognition

YQ Du, J Zhang, X Fang, MH Wu… - IEEE/ACM Transactions …, 2023 - ieeexplore.ieee.org
Both unpaired speech and text have shown to be beneficial for low-resource automatic
speech recognition (ASR), which, however were either separately used for pre-training, self …

Contextual semi-supervised learning: An approach to leverage air-surveillance and untranscribed ATC data in ASR systems

J Zuluaga-Gomez, I Nigmatulina, A Prasad… - arXiv preprint arXiv …, 2021 - arxiv.org
Air traffic management and specifically air-traffic control (ATC) rely mostly on voice
communications between Air Traffic Controllers (ATCos) and pilots. In most cases, these …

Ilasr: privacy-preserving incremental learning for automatic speech recognition at production scale

G Chennupati, M Rao, G Chadha, A Eakin… - Proceedings of the 28th …, 2022 - dl.acm.org
Incremental learning is one paradigm to enable model building and updating at scale with
streaming data. For end-to-end automatic speech recognition (ASR) tasks, the absence of …

Modeling uncertainty in predicting emotional attributes from spontaneous speech

K Sridhar, C Busso - ICASSP 2020-2020 IEEE International …, 2020 - ieeexplore.ieee.org
A challenging task in affective computing is to build reliable speech emotion recognition
(SER) systems that can accurately predict emotional attributes from spontaneous speech. To …

Low Resource German ASR with Untranscribed Data Spoken by Non-native Children--INTERSPEECH 2021 Shared Task SPAPL System

J Wang, Y Zhu, R Fan, W Chu, A Alwan - arXiv preprint arXiv:2106.09963, 2021 - arxiv.org
This paper describes the SPAPL system for the INTERSPEECH 2021 Challenge: Shared
Task on Automatic Speech Recognition for Non-Native Children's Speech in German.~ 5 …

Semi-supervised end-to-end ASR via teacher-student learning with conditional posterior distribution

Z Zhang, Y Song, J Zhang, I McLoughlin, L Dai - 2020 - irr.singaporetech.edu.sg
Encoder-decoder based methods have become popular for automatic speech recognition
(ASR), thanks to their simplified processing stages and low reliance on prior knowledge …