Improving Aggregation and Loss Function for Better Embedding Learning in End-to-End Speaker...

Z Bai, XL Zhang - Neural Networks, 2021 - Elsevier

Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …

被引用次数：419 相关文章所有 9 个版本

[PDF] arxiv.org

Ecapa-tdnn: Emphasized channel attention, propagation and aggregation in tdnn based speaker verification

B Desplanques, J Thienpondt, K Demuynck - arXiv preprint arXiv …, 2020 - arxiv.org

Current speaker verification techniques rely on a neural network to extract speaker
representations. The successful x-vector architecture is a Time Delay Neural Network …

被引用次数：1578 相关文章所有 15 个版本

[PDF] arxiv.org

Deep learning methods in speaker recognition: a review

D Sztahó, G Szaszák, A Beke - arXiv preprint arXiv:1911.06615, 2019 - arxiv.org

This paper summarizes the applied deep learning practices in the field of speaker
recognition, both verification and identification. Speaker recognition has been a widely used …

被引用次数：80 相关文章所有 8 个版本

[PDF] arxiv.org

Mfa-conformer: Multi-scale feature aggregation conformer for automatic speaker verification

Y Zhang, Z Lv, H Wu, S Zhang, P Hu, Z Wu… - arXiv preprint arXiv …, 2022 - arxiv.org

In this paper, we present Multi-scale Feature Aggregation Conformer (MFA-Conformer), an
easy-to-implement, simple but effective backbone for automatic speaker verification based …

被引用次数：142 相关文章所有 6 个版本

[PDF] arxiv.org

Cn-celeb: multi-genre speaker recognition

L Li, R Liu, J Kang, Y Fan, H Cui, Y Cai, R Vipperla… - Speech …, 2022 - Elsevier

Research on speaker recognition is extending to address the vulnerability in the wild
conditions, among which genre mismatch is perhaps the most challenging, for instance …

被引用次数：128 相关文章所有 7 个版本

[PDF] arxiv.org

ECAPA-TDNN embeddings for speaker diarization

N Dawalatabad, M Ravanelli, F Grondin… - arXiv preprint arXiv …, 2021 - arxiv.org

Learning robust speaker embeddings is a crucial step in speaker diarization. Deep neural
networks can accurately capture speaker discriminative characteristics and popular deep …

被引用次数：120 相关文章所有 14 个版本

[PDF] arxiv.org

Integrating frequency translational invariance in tdnns and frequency positional information in 2d resnets to enhance speaker verification

J Thienpondt, B Desplanques, K Demuynck - arXiv preprint arXiv …, 2021 - arxiv.org

This paper describes the IDLab submission for the text-independent task of the Short-
duration Speaker Verification Challenge 2021 (SdSVC-21). This speaker verification …

被引用次数：84 相关文章所有 11 个版本

[PDF] nju.edu.cn

[PDF][PDF] Densely Connected Time Delay Neural Network for Speaker Verification.

YQ Yu, WJ Li - Interspeech, 2020 - cs.nju.edu.cn

Time delay neural network (TDNN) has been widely used in speaker verification tasks.
Recently, two TDNN-based models, including extended TDNN (E-TDNN) and factorized …

被引用次数：74 相关文章所有 6 个版本

[PDF] arxiv.org

Autospeech: Neural architecture search for speaker recognition

S Ding, T Chen, X Gong, W Zha, Z Wang - arXiv preprint arXiv:2005.03215, 2020 - arxiv.org

Speaker recognition systems based on Convolutional Neural Networks (CNNs) are often
built with off-the-shelf backbones such as VGG-Net or ResNet. However, these backbones …

被引用次数：63 相关文章所有 8 个版本

[PDF] arxiv.org

RawNeXt: Speaker verification system for variable-duration utterances with deep layer aggregation and extended dynamic scaling policies

J Kim, H Shim, J Heo, HJ Yu - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org

Despite achieving satisfactory performance in speaker verification using deep neural
networks, variable-duration utterances remain a challenge that threatens the robustness of …

被引用次数：29 相关文章所有 5 个版本