Toward domain-invariant speech recognition via large scale training

A Narayanan, A Misra, KC Sim… - 2018 IEEE Spoken …, 2018 - ieeexplore.ieee.org
Current state-of-the-art automatic speech recognition systems are trained to work in
specificdomains', defined based on factors like application, sampling rate and codec. When …

A conformer-based asr frontend for joint acoustic echo cancellation, speech enhancement and speech separation

T O'Malley, A Narayanan, Q Wang… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
We present a frontend for improving robustness of automatic speech recognition (ASR), that
jointly implements three modules within a single model: acoustic echo cancellation, speech …

Cross-attention conformer for context modeling in speech enhancement for ASR

A Narayanan, CC Chiu, T O'Malley… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
This work introduces cross-attention conformer, an attention-based architecture for context
modeling in speech enhancement. Given that the context information can often be …

Leveraging native language information for improved accented speech recognition

S Ghorbani, JHL Hansen - arXiv preprint arXiv:1904.09038, 2019 - arxiv.org
Recognition of accented speech is a long-standing challenge for automatic speech
recognition (ASR) systems, given the increasing worldwide population of bi-lingual speakers …

Speaker adaptation for end-to-end CTC models

K Li, J Li, Y Zhao, K Kumar… - 2018 IEEE Spoken …, 2018 - ieeexplore.ieee.org
We propose two approaches for speaker adaptation in end-to-end (E2E) automatic speech
recognition systems. One is Kullback-Leibler divergence (KLD) regularization and the other …

Updating only encoders prevents catastrophic forgetting of end-to-end ASR models

Y Takashima, S Horiguchi, S Watanabe… - arXiv preprint arXiv …, 2022 - arxiv.org
In this paper, we present an incremental domain adaptation technique to prevent
catastrophic forgetting for an end-to-end automatic speech recognition (ASR) model …

Multi-domain adversarial training of neural network acoustic models for distant speech recognition

S Mirsamadi, JHL Hansen - Speech Communication, 2019 - Elsevier
Building deep neural network acoustic models directly based on far-field speech from
multiple recording environments with different acoustic properties is an increasingly popular …

Domain adaptation of end-to-end speech recognition in low-resource settings

L Samarakoon, B Mak, AYS Lam - 2018 IEEE Spoken …, 2018 - ieeexplore.ieee.org
End-to-end automatic speech recognition (ASR) has simplified the traditional ASR system
building pipeline by eliminating the need to have multiple components and also the …

Advancing multi-accented lstm-ctc speech recognition using a domain specific student-teacher learning paradigm

S Ghorbani, AE Bulut… - 2018 IEEE Spoken …, 2018 - ieeexplore.ieee.org
Non-native speech causes automatic speech recognition systems to degrade in
performance. Past strategies to address this challenge have considered model adaptation …

Conditional conformer: Improving speaker modulation for single and multi-user speech enhancement

T O'Malley, S Ding, A Narayanan… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Recently, Feature-wise Linear Modulation (FiLM) has been shown to outperform other
approaches to incorporate speaker embedding into speech separation and VoiceFilter …