Speaker recognition based on deep learning: An overview

Z Bai, XL Zhang - Neural Networks, 2021 - Elsevier
Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …

Zero-shot multi-speaker text-to-speech with state-of-the-art neural speaker embeddings

E Cooper, CI Lai, Y Yasuda, F Fang… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
While speaker adaptation for end-to-end speech synthesis using speaker embeddings can
produce good speaker similarity for speakers seen during training, there remains a gap for …

Improved rawnet with feature map scaling for text-independent speaker verification using raw waveforms

J Jung, S Kim, H Shim, J Kim, HJ Yu - arXiv preprint arXiv:2004.00526, 2020 - arxiv.org
Recent advances in deep learning have facilitated the design of speaker verification
systems that directly input raw waveforms. For example, RawNet extracts speaker …

Meta-learning for short utterance speaker recognition with imbalance length pairs

SM Kye, Y Jung, HB Lee, SJ Hwang, H Kim - arXiv preprint arXiv …, 2020 - arxiv.org
In practical settings, a speaker recognition system needs to identify a speaker given a short
utterance, while the enrollment utterance may be relatively long. However, existing speaker …

Improving multi-scale aggregation using feature pyramid module for robust speaker verification of variable-duration utterances

Y Jung, SM Kye, Y Choi, M Jung, H Kim - arXiv preprint arXiv:2004.03194, 2020 - arxiv.org
Currently, the most widely used approach for speaker verification is the deep speaker
embedding learning. In this approach, we obtain a speaker embedding vector by pooling …

Graph attentive feature aggregation for text-independent speaker verification

H Shim, J Heo, JH Park, GH Lee… - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
The objective of this paper is to combine multiple frame-level features into a single utterance-
level representation considering pair-wise relationships. For this purpose, we propose a …

Double multi-head attention for speaker verification

M India, P Safari, J Hernando - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
Most state-of-the-art Deep Learning systems for text-independent speaker verification are
based on speaker embedding extractors. These architectures are commonly composed of a …

Towards improving synthetic audio spoofing detection robustness via meta-learning and disentangled training with adversarial examples

Z Wang, JHL Hansen - IEEE Access, 2024 - ieeexplore.ieee.org
Advances in automatic speaker verification (ASV) promote research into the formulation of
spoofing detection systems for real-world applications. The performance of ASV systems can …

Deep MOS predictor for synthetic speech using cluster-based modeling

Y Choi, Y Jung, H Kim - arXiv preprint arXiv:2008.03710, 2020 - arxiv.org
While deep learning has made impressive progress in speech synthesis and voice
conversion, the assessment of the synthesized speech is still carried out by human …

A unified deep learning framework for short-duration speaker verification in adverse environments

Y Jung, Y Choi, H Lim, H Kim - IEEE Access, 2020 - ieeexplore.ieee.org
Speaker verification (SV) has recently attracted considerable research interest due to the
growing popularity of virtual assistants. At the same time, there is an increasing requirement …