Spex: Multi-scale time domain speaker extraction network

C Xu, W Rao, ES Chng, H Li - IEEE/ACM transactions on audio …, 2020 - ieeexplore.ieee.org
Speaker extraction aims to mimic humans' selective auditory attention by extracting a target
speaker's voice from a multi-talker environment. It is common to perform the extraction in …

Single channel target speaker extraction and recognition with speaker beam

M Delcroix, K Zmolikova, K Kinoshita… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org
This paper addresses the problem of single channel speech recognition of a target speaker
in a mixture of speech signals. We propose to exploit auxiliary speaker information provided …

Recent progresses in deep learning based acoustic models

D Yu, J Li - IEEE/CAA Journal of automatica sinica, 2017 - ieeexplore.ieee.org
In this paper, we summarize recent progresses made in deep learning based acoustic
models and the motivation and insights behind the surveyed techniques. We first discuss …

Deep extractor network for target speaker recovery from single channel speech mixtures

J Wang, J Chen, D Su, L Chen, M Yu, Y Qian… - arXiv preprint arXiv …, 2018 - arxiv.org
Speaker-aware source separation methods are promising workarounds for major difficulties
such as arbitrary source permutation and unknown number of sources. However, it remains …

Robust sound event detection in bioacoustic sensor networks

V Lostanlen, J Salamon, A Farnsworth, S Kelling… - PloS one, 2019 - journals.plos.org
Bioacoustic sensors, sometimes known as autonomous recording units (ARUs), can record
sounds of wildlife over long periods of time in scalable and minimally invasive ways …

Time-domain speaker extraction network

C Xu, W Rao, ES Chng, H Li - 2019 IEEE Automatic Speech …, 2019 - ieeexplore.ieee.org
Speaker extraction is to extract a target speaker's voice from multi-talker speech. It simulates
humans' cocktail party effect or the selective listening ability. The prior work mostly performs …

Optimization of speaker extraction neural network with magnitude and temporal spectrum approximation loss

C Xu, W Rao, ES Chng, H Li - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org
The SpeakerBeam-FE (SBF) method is proposed for speaker extraction. It attempts to
overcome the problem of unknown number of speakers in an audio recording during source …

Factorized hidden layer adaptation for deep neural network based acoustic modeling

L Samarakoon, KC Sim - IEEE/ACM Transactions on Audio …, 2016 - ieeexplore.ieee.org
In this paper, we propose the factorized hidden layer (FHL) approach to adapt the deep
neural network (DNN) acoustic models for automatic speech recognition (ASR). FHL aims at …

Residual language model for end-to-end speech recognition

E Tsunoo, Y Kashiwagi, C Narisetty… - arXiv preprint arXiv …, 2022 - arxiv.org
End-to-end automatic speech recognition suffers from adaptation to unknown target domain
speech despite being trained with a large amount of paired audio--text data. Recent studies …

Learning speaker representation for neural network based multichannel speaker extraction

K Žmolíková, M Delcroix, K Kinoshita… - 2017 IEEE Automatic …, 2017 - ieeexplore.ieee.org
Recently, schemes employing deep neural networks (DNNs) for extracting speech from
noisy observation have demonstrated great potential for noise robust automatic speech …