An Effective Deep Embedding Learning Architecture for Speaker Verification.

Z Bai, XL Zhang - Neural Networks, 2021 - Elsevier

Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …

被引用次数：358 相关文章所有 9 个版本

[PDF] arxiv.org

Deep learning methods in speaker recognition: a review

D Sztahó, G Szaszák, A Beke - arXiv preprint arXiv:1911.06615, 2019 - arxiv.org

This paper summarizes the applied deep learning practices in the field of speaker
recognition, both verification and identification. Speaker recognition has been a widely used …

被引用次数：73 相关文章所有 8 个版本

[PDF] arxiv.org

MFA: TDNN with multi-scale frequency-channel attention for text-independent speaker verification with short utterances

T Liu, RK Das, KA Lee, H Li - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org

The time delay neural network (TDNN) represents one of the state-of-the-art of neural
solutions to text-independent speaker verification. However, they require a large number of …

被引用次数：69 相关文章所有 5 个版本

[PDF] nju.edu.cn

[PDF][PDF] Densely Connected Time Delay Neural Network for Speaker Verification.

YQ Yu, WJ Li - Interspeech, 2020 - cs.nju.edu.cn

Time delay neural network (TDNN) has been widely used in speaker verification tasks.
Recently, two TDNN-based models, including extended TDNN (E-TDNN) and factorized …

被引用次数：64 相关文章所有 6 个版本

[PDF] arxiv.org

Multi-view self-attention based transformer for speaker recognition

R Wang, J Ao, L Zhou, S Liu, Z Wei, T Ko… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Initially developed for natural language processing (NLP), Transformer model is now widely
used for speech processing tasks such as speaker recognition, due to its powerful sequence …

被引用次数：43 相关文章所有 5 个版本

[PDF] arxiv.org

Improving multi-scale aggregation using feature pyramid module for robust speaker verification of variable-duration utterances

Y Jung, SM Kye, Y Choi, M Jung, H Kim - arXiv preprint arXiv:2004.03194, 2020 - arxiv.org

Currently, the most widely used approach for speaker verification is the deep speaker
embedding learning. In this approach, we obtain a speaker embedding vector by pooling …

被引用次数：46 相关文章所有 8 个版本

[PDF] interspeech2020.org

[PDF][PDF] Vector-based attentive pooling for text-independent speaker verification.

Y Wu, C Guo, H Gao, X Hou, J Xu - Interspeech, 2020 - interspeech2020.org

The pooling mechanism plays an important role in deep neural network based systems for
text-independent speaker verification, which aggregates the variable-length frame-level …

被引用次数：26 相关文章所有 4 个版本

D-MONA: A dilated mixed-order non-local attention network for speaker and language recognition

X Miao, I McLoughlin, W Wang, P Zhang - Neural Networks, 2021 - Elsevier

Attention-based convolutional neural network (CNN) models are increasingly being adopted
for speaker and language recognition (SR/LR) tasks. These include time, frequency, spatial …

被引用次数：15 相关文章所有 6 个版本

An effective deep embedding learning method based on dense-residual networks for speaker verification

Y Liu, Y Song, I McLoughlin, L Liu… - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org

In this paper, we present an effective end-to-end deep embedding learning method based
on Dense-Residual networks, which combine the advantages of a densely connected …

被引用次数：15 相关文章所有 3 个版本

[HTML] mdpi.com

[HTML][HTML] Global–local self-attention based transformer for speaker verification

F Xie, D Zhang, C Liu - Applied Sciences, 2022 - mdpi.com

Transformer models are now widely used for speech processing tasks due to their powerful
sequence modeling capabilities. Previous work determined an efficient way to model …

被引用次数：8 相关文章所有 4 个版本