Speaker recognition based on deep learning: An overview
Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …
learning has dramatically revolutionized speaker recognition. However, there is lack of …
A review of speaker diarization: Recent advances with deep learning
Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …
[HTML][HTML] Self-attentive speaker embeddings for text-independent speaker verification.
摘要This paper introduces a new method to extract speaker embed-dings from a deep
neural network (DNN) for text-independent speaker verification. Usually, speaker …
neural network (DNN) for text-independent speaker verification. Usually, speaker …
Speech resynthesis from discrete disentangled self-supervised representations
We propose using self-supervised discrete representations for the task of speech
resynthesis. To generate disentangled representation, we separately extract low-bitrate …
resynthesis. To generate disentangled representation, we separately extract low-bitrate …
[HTML][HTML] Voxceleb: Large-scale speaker verification in the wild
The objective of this work is speaker recognition under noisy and unconstrained conditions.
We make two key contributions. First, we introduce a very large-scale audio-visual dataset …
We make two key contributions. First, we introduce a very large-scale audio-visual dataset …
Contentvec: An improved self-supervised speech representation by disentangling speakers
Self-supervised learning in speech involves training a speech representation network on a
large-scale unannotated speech corpus, and then applying the learned representations to …
large-scale unannotated speech corpus, and then applying the learned representations to …
Speaker recognition from raw waveform with sincnet
M Ravanelli, Y Bengio - 2018 IEEE spoken language …, 2018 - ieeexplore.ieee.org
Deep learning is progressively gaining popularity as a viable alternative to i-vectors for
speaker recognition. Promising results have been recently obtained with Convolutional …
speaker recognition. Promising results have been recently obtained with Convolutional …
X-vectors: Robust dnn embeddings for speaker recognition
In this paper, we use data augmentation to improve performance of deep neural network
(DNN) embeddings for speaker recognition. The DNN, which is trained to discriminate …
(DNN) embeddings for speaker recognition. The DNN, which is trained to discriminate …
Autovc: Zero-shot voice style transfer with only autoencoder loss
Despite the progress in voice conversion, many-to-many voice conversion trained on non-
parallel data, as well as zero-shot voice conversion, remains under-explored. Deep style …
parallel data, as well as zero-shot voice conversion, remains under-explored. Deep style …
Transfer learning from speaker verification to multispeaker text-to-speech synthesis
We describe a neural network-based system for text-to-speech (TTS) synthesis that is able to
generate speech audio in the voice of many different speakers, including those unseen …
generate speech audio in the voice of many different speakers, including those unseen …