Speaker recognition based on deep learning: An overview

Z Bai, XL Zhang - Neural Networks, 2021 - Elsevier
Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …

Disentangling voice and content with self-supervision for speaker recognition

T Liu, KA Lee, Q Wang, H Li - Advances in Neural …, 2023 - proceedings.neurips.cc
For speaker recognition, it is difficult to extract an accurate speaker representation from
speech because of its mixture of speaker traits and content. This paper proposes a …

On-the-fly data loader and utterance-level aggregation for speaker and language recognition

W Cai, J Chen, J Zhang, M Li - IEEE/ACM Transactions on …, 2020 - ieeexplore.ieee.org
In this article, our recent efforts on directly modeling utterance-level aggregation for speaker
and language recognition is summarized. First, an on-the-fly data loader for efficient network …

Leveraging asr pretrained conformers for speaker verification through transfer learning and knowledge distillation

D Cai, M Li - IEEE/ACM Transactions on Audio, Speech, and …, 2024 - ieeexplore.ieee.org
This paper focuses on the application of Conformers in speaker verification. Conformers,
initially designed for Automatic Speech Recognition (ASR), excel at modeling both local and …

CNN with phonetic attention for text-independent speaker verification

T Zhou, Y Zhao, J Li, Y Gong… - 2019 IEEE Automatic …, 2019 - ieeexplore.ieee.org
Text-independent speaker verification imposes no constraints on the spoken content and
usually needs long observations to make reliable prediction. In this paper, we propose two …

Multi-resolution multi-head attention in deep speaker embedding

Z Wang, K Yao, X Li, S Fang - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org
Pooling is an essential component to capture long-term speaker characteristics for speaker
recognition. This paper proposes simple but effective pooling methods to compute attentive …

MEConformer: Highly representative embedding extractor for speaker verification via incorporating selective convolution into deep speaker encoder

Q Zheng, Z Chen, Z Wang, H Liu, M Lin - Expert Systems with Applications, 2024 - Elsevier
Transformer models have demonstrated superior performance across various domains,
including computer vision, natural language processing, and speech recognition. The …

Svoice: Enabling voice communication in silence via acoustic sensing on commodity devices

Y Fu, S Wang, L Zhong, L Chen, J Ren… - Proceedings of the 20th …, 2022 - dl.acm.org
Silent Speech Interface (SSI) has been proposed as a means of reconstructing audible
speech from silent articulatory gestures for covert voice communication in public and voice …

Improving continuous sign language recognition with consistency constraints and signer removal

R Zuo, B Mak - ACM Transactions on Multimedia Computing …, 2024 - dl.acm.org
Deep-learning-based continuous sign language recognition (CSLR) models typically consist
of a visual module, a sequential module, and an alignment module. However, the …

Ultrasr: Silent speech reconstruction via acoustic sensing

Y Fu, S Wang, L Zhong, L Chen, J Ren… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Silent Speech Interfaces (SSI) have been developed to convert silent articulatory gestures
into speech, aiding communication in public spaces and assisting individuals with aphasia …