Speaker recognition based on deep learning: An overview

Z Bai, XL Zhang - Neural Networks, 2021 - Elsevier
Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …

A review of speaker diarization: Recent advances with deep learning

TJ Park, N Kanda, D Dimitriadis, KJ Han… - Computer Speech & …, 2022 - Elsevier
Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …

[HTML][HTML] Self-attentive speaker embeddings for text-independent speaker verification.

Y Zhu, T Ko, D Snyder, B Mak, D Povey - Interspeech, 2018 - pianshen.com
摘要This paper introduces a new method to extract speaker embed-dings from a deep
neural network (DNN) for text-independent speaker verification. Usually, speaker …

Speech resynthesis from discrete disentangled self-supervised representations

A Polyak, Y Adi, J Copet, E Kharitonov… - arXiv preprint arXiv …, 2021 - arxiv.org
We propose using self-supervised discrete representations for the task of speech
resynthesis. To generate disentangled representation, we separately extract low-bitrate …

[HTML][HTML] Voxceleb: Large-scale speaker verification in the wild

A Nagrani, JS Chung, W Xie, A Zisserman - Computer Speech & Language, 2020 - Elsevier
The objective of this work is speaker recognition under noisy and unconstrained conditions.
We make two key contributions. First, we introduce a very large-scale audio-visual dataset …

Contentvec: An improved self-supervised speech representation by disentangling speakers

K Qian, Y Zhang, H Gao, J Ni, CI Lai… - International …, 2022 - proceedings.mlr.press
Self-supervised learning in speech involves training a speech representation network on a
large-scale unannotated speech corpus, and then applying the learned representations to …

Speaker recognition from raw waveform with sincnet

M Ravanelli, Y Bengio - 2018 IEEE spoken language …, 2018 - ieeexplore.ieee.org
Deep learning is progressively gaining popularity as a viable alternative to i-vectors for
speaker recognition. Promising results have been recently obtained with Convolutional …

X-vectors: Robust dnn embeddings for speaker recognition

D Snyder, D Garcia-Romero, G Sell… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org
In this paper, we use data augmentation to improve performance of deep neural network
(DNN) embeddings for speaker recognition. The DNN, which is trained to discriminate …

Autovc: Zero-shot voice style transfer with only autoencoder loss

K Qian, Y Zhang, S Chang, X Yang… - International …, 2019 - proceedings.mlr.press
Despite the progress in voice conversion, many-to-many voice conversion trained on non-
parallel data, as well as zero-shot voice conversion, remains under-explored. Deep style …

Transfer learning from speaker verification to multispeaker text-to-speech synthesis

Y Jia, Y Zhang, R Weiss, Q Wang… - Advances in neural …, 2018 - proceedings.neurips.cc
We describe a neural network-based system for text-to-speech (TTS) synthesis that is able to
generate speech audio in the voice of many different speakers, including those unseen …