Neural target speech extraction: An overview

K Zmolikova, M Delcroix, T Ochiai… - IEEE Signal …, 2023 - ieeexplore.ieee.org
Humans can listen to a target speaker even in challenging acoustic conditions that have
noise, reverberation, and interfering speakers. This phenomenon is known as the cocktail …

Speakerbeam: Speaker aware neural network for target speaker extraction in speech mixtures

K Žmolíková, M Delcroix, K Kinoshita… - IEEE Journal of …, 2019 - ieeexplore.ieee.org
The processing of speech corrupted by interfering overlapping speakers is one of the
challenging problems with regards to today's automatic speech recognition systems …

Single channel target speaker extraction and recognition with speaker beam

M Delcroix, K Zmolikova, K Kinoshita… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org
This paper addresses the problem of single channel speech recognition of a target speaker
in a mixture of speech signals. We propose to exploit auxiliary speaker information provided …

Lessons from building acoustic models with a million hours of speech

SHK Parthasarathi, N Strom - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org
This is a report of our lessons learned building acoustic models from 1 Million hours of
unlabeled speech, while labeled speech is restricted to 7,000 hours. We employ …

Deep extractor network for target speaker recovery from single channel speech mixtures

J Wang, J Chen, D Su, L Chen, M Yu, Y Qian… - arXiv preprint arXiv …, 2018 - arxiv.org
Speaker-aware source separation methods are promising workarounds for major difficulties
such as arbitrary source permutation and unknown number of sources. However, it remains …

[HTML][HTML] Bioinspired dual-channel speech recognition using graphene-based electromyographic and mechanical sensors

H Tian, X Li, Y Wei, S Ji, Q Yang, GY Gou… - Cell Reports Physical …, 2022 - cell.com
Automatic speech recognition (ASR) is helpful to improve quality of life. However, the
performance of ASR degrades in the case of noisy environment, limited privacy, and speech …

Improving noise robustness of automatic speech recognition via parallel data and teacher-student learning

L Mošner, M Wu, A Raju… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
For real-world speech recognition applications, noise robustness is still a challenge. In this
work, we adopt the teacher-student (T/S) learning technique using a parallel clean and noisy …

Developing far-field speaker system via teacher-student learning

J Li, R Zhao, Z Chen, C Liu, X Xiao… - … on Acoustics, Speech …, 2018 - ieeexplore.ieee.org
In this study, we develop the keyword spotting (KWS) and acoustic model (AM) components
in a far-field speaker system. Specifically, we use teacher-student (T/S) learning to adapt a …

Frequency domain multi-channel acoustic modeling for distant speech recognition

W Minhua, K Kumatani, S Sundaram… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
Conventional far-field automatic speech recognition (ASR) systems typically employ
microphone array techniques for speech enhancement in order to improve robustness …

Slot-triggered contextual biasing for personalized speech recognition using neural transducers

S Tong, P Harding, S Wiesler - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
End-to-end (E2E) automatic speech recognition (ASR) models have been found to perform
well on general transcription tasks but often fail to correctly recognize words that occur …