Speaker recognition based on deep learning: An overview

Z Bai, XL Zhang - Neural Networks, 2021 - Elsevier
Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …

Deep learning methods in speaker recognition: a review

D Sztahó, G Szaszák, A Beke - arXiv preprint arXiv:1911.06615, 2019 - arxiv.org
This paper summarizes the applied deep learning practices in the field of speaker
recognition, both verification and identification. Speaker recognition has been a widely used …

MFA: TDNN with multi-scale frequency-channel attention for text-independent speaker verification with short utterances

T Liu, RK Das, KA Lee, H Li - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
The time delay neural network (TDNN) represents one of the state-of-the-art of neural
solutions to text-independent speaker verification. However, they require a large number of …

[PDF][PDF] Densely Connected Time Delay Neural Network for Speaker Verification.

YQ Yu, WJ Li - Interspeech, 2020 - cs.nju.edu.cn
Time delay neural network (TDNN) has been widely used in speaker verification tasks.
Recently, two TDNN-based models, including extended TDNN (E-TDNN) and factorized …

Multi-view self-attention based transformer for speaker recognition

R Wang, J Ao, L Zhou, S Liu, Z Wei, T Ko… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Initially developed for natural language processing (NLP), Transformer model is now widely
used for speech processing tasks such as speaker recognition, due to its powerful sequence …

Improving multi-scale aggregation using feature pyramid module for robust speaker verification of variable-duration utterances

Y Jung, SM Kye, Y Choi, M Jung, H Kim - arXiv preprint arXiv:2004.03194, 2020 - arxiv.org
Currently, the most widely used approach for speaker verification is the deep speaker
embedding learning. In this approach, we obtain a speaker embedding vector by pooling …

[PDF][PDF] Vector-based attentive pooling for text-independent speaker verification.

Y Wu, C Guo, H Gao, X Hou, J Xu - Interspeech, 2020 - interspeech2020.org
The pooling mechanism plays an important role in deep neural network based systems for
text-independent speaker verification, which aggregates the variable-length frame-level …

D-MONA: A dilated mixed-order non-local attention network for speaker and language recognition

X Miao, I McLoughlin, W Wang, P Zhang - Neural Networks, 2021 - Elsevier
Attention-based convolutional neural network (CNN) models are increasingly being adopted
for speaker and language recognition (SR/LR) tasks. These include time, frequency, spatial …

An effective deep embedding learning method based on dense-residual networks for speaker verification

Y Liu, Y Song, I McLoughlin, L Liu… - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
In this paper, we present an effective end-to-end deep embedding learning method based
on Dense-Residual networks, which combine the advantages of a densely connected …

[HTML][HTML] Global–local self-attention based transformer for speaker verification

F Xie, D Zhang, C Liu - Applied Sciences, 2022 - mdpi.com
Transformer models are now widely used for speech processing tasks due to their powerful
sequence modeling capabilities. Previous work determined an efficient way to model …