Speaker recognition based on deep learning: An overview
Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …
learning has dramatically revolutionized speaker recognition. However, there is lack of …
Disentangling voice and content with self-supervision for speaker recognition
For speaker recognition, it is difficult to extract an accurate speaker representation from
speech because of its mixture of speaker traits and content. This paper proposes a …
speech because of its mixture of speaker traits and content. This paper proposes a …
On-the-fly data loader and utterance-level aggregation for speaker and language recognition
In this article, our recent efforts on directly modeling utterance-level aggregation for speaker
and language recognition is summarized. First, an on-the-fly data loader for efficient network …
and language recognition is summarized. First, an on-the-fly data loader for efficient network …
Leveraging asr pretrained conformers for speaker verification through transfer learning and knowledge distillation
This paper focuses on the application of Conformers in speaker verification. Conformers,
initially designed for Automatic Speech Recognition (ASR), excel at modeling both local and …
initially designed for Automatic Speech Recognition (ASR), excel at modeling both local and …
CNN with phonetic attention for text-independent speaker verification
Text-independent speaker verification imposes no constraints on the spoken content and
usually needs long observations to make reliable prediction. In this paper, we propose two …
usually needs long observations to make reliable prediction. In this paper, we propose two …
Multi-resolution multi-head attention in deep speaker embedding
Pooling is an essential component to capture long-term speaker characteristics for speaker
recognition. This paper proposes simple but effective pooling methods to compute attentive …
recognition. This paper proposes simple but effective pooling methods to compute attentive …
MEConformer: Highly representative embedding extractor for speaker verification via incorporating selective convolution into deep speaker encoder
Transformer models have demonstrated superior performance across various domains,
including computer vision, natural language processing, and speech recognition. The …
including computer vision, natural language processing, and speech recognition. The …
Svoice: Enabling voice communication in silence via acoustic sensing on commodity devices
Silent Speech Interface (SSI) has been proposed as a means of reconstructing audible
speech from silent articulatory gestures for covert voice communication in public and voice …
speech from silent articulatory gestures for covert voice communication in public and voice …
Improving continuous sign language recognition with consistency constraints and signer removal
Deep-learning-based continuous sign language recognition (CSLR) models typically consist
of a visual module, a sequential module, and an alignment module. However, the …
of a visual module, a sequential module, and an alignment module. However, the …
Ultrasr: Silent speech reconstruction via acoustic sensing
Silent Speech Interfaces (SSI) have been developed to convert silent articulatory gestures
into speech, aiding communication in public spaces and assisting individuals with aphasia …
into speech, aiding communication in public spaces and assisting individuals with aphasia …