Spex: Multi-scale time domain speaker extraction network
Speaker extraction aims to mimic humans' selective auditory attention by extracting a target
speaker's voice from a multi-talker environment. It is common to perform the extraction in …
speaker's voice from a multi-talker environment. It is common to perform the extraction in …
Single channel target speaker extraction and recognition with speaker beam
This paper addresses the problem of single channel speech recognition of a target speaker
in a mixture of speech signals. We propose to exploit auxiliary speaker information provided …
in a mixture of speech signals. We propose to exploit auxiliary speaker information provided …
Recent progresses in deep learning based acoustic models
In this paper, we summarize recent progresses made in deep learning based acoustic
models and the motivation and insights behind the surveyed techniques. We first discuss …
models and the motivation and insights behind the surveyed techniques. We first discuss …
Deep extractor network for target speaker recovery from single channel speech mixtures
Speaker-aware source separation methods are promising workarounds for major difficulties
such as arbitrary source permutation and unknown number of sources. However, it remains …
such as arbitrary source permutation and unknown number of sources. However, it remains …
Robust sound event detection in bioacoustic sensor networks
Bioacoustic sensors, sometimes known as autonomous recording units (ARUs), can record
sounds of wildlife over long periods of time in scalable and minimally invasive ways …
sounds of wildlife over long periods of time in scalable and minimally invasive ways …
Time-domain speaker extraction network
Speaker extraction is to extract a target speaker's voice from multi-talker speech. It simulates
humans' cocktail party effect or the selective listening ability. The prior work mostly performs …
humans' cocktail party effect or the selective listening ability. The prior work mostly performs …
Optimization of speaker extraction neural network with magnitude and temporal spectrum approximation loss
The SpeakerBeam-FE (SBF) method is proposed for speaker extraction. It attempts to
overcome the problem of unknown number of speakers in an audio recording during source …
overcome the problem of unknown number of speakers in an audio recording during source …
Factorized hidden layer adaptation for deep neural network based acoustic modeling
L Samarakoon, KC Sim - IEEE/ACM Transactions on Audio …, 2016 - ieeexplore.ieee.org
In this paper, we propose the factorized hidden layer (FHL) approach to adapt the deep
neural network (DNN) acoustic models for automatic speech recognition (ASR). FHL aims at …
neural network (DNN) acoustic models for automatic speech recognition (ASR). FHL aims at …
Residual language model for end-to-end speech recognition
E Tsunoo, Y Kashiwagi, C Narisetty… - arXiv preprint arXiv …, 2022 - arxiv.org
End-to-end automatic speech recognition suffers from adaptation to unknown target domain
speech despite being trained with a large amount of paired audio--text data. Recent studies …
speech despite being trained with a large amount of paired audio--text data. Recent studies …
Learning speaker representation for neural network based multichannel speaker extraction
Recently, schemes employing deep neural networks (DNNs) for extracting speech from
noisy observation have demonstrated great potential for noise robust automatic speech …
noisy observation have demonstrated great potential for noise robust automatic speech …