SpeechBrain: A general-purpose speech toolkit

M Ravanelli, T Parcollet, P Plantinga, A Rouhe… - arXiv preprint arXiv …, 2021 - arxiv.org
SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the
research and development of neural speech processing technologies by being simple …

Bayesian hmm clustering of x-vector sequences (vbx) in speaker diarization: theory, implementation and analysis on standard tasks

F Landini, J Profant, M Diez, L Burget - Computer Speech & Language, 2022 - Elsevier
The recently proposed VBx diarization method uses a Bayesian hidden Markov model to
find speaker clusters in a sequence of x-vectors. In this work we perform an extensive …

Deep speaker recognition: Process, progress, and challenges

AQ Ohi, MF Mridha, MA Hamid, MM Monowar - IEEE Access, 2021 - ieeexplore.ieee.org
Speaker recognition is related to human biometrics dealing with the identification of
speakers from their speech. Speaker recognition is an active research area and being …

Titanet: Neural model for speaker representation with 1d depth-wise separable convolutions and global context

NR Koluguri, T Park, B Ginsburg - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
In this paper, we propose TitaNet, a novel neural network architecture for extracting speaker
representations. We employ 1D depth-wise separable convolutions with Squeeze-and …

ECAPA-TDNN embeddings for speaker diarization

N Dawalatabad, M Ravanelli, F Grondin… - arXiv preprint arXiv …, 2021 - arxiv.org
Learning robust speaker embeddings is a crucial step in speaker diarization. Deep neural
networks can accurately capture speaker discriminative characteristics and popular deep …

序列数据的数据增强方法综述.

葛轶洲, 许翔, 杨锁荣, 周青… - Journal of Frontiers of …, 2021 - search.ebscohost.com
为了追求精度, 深度学习模型框架的结构越来越复杂, 网络越来越深. 参数量的增加意味着训练
模型需要更多的数据. 然而人工标注数据的成本是高昂的, 且受客观原因所限 …

Meta-generalization for domain-invariant speaker verification

H Zhang, L Wang, KA Lee, M Liu… - … /ACM Transactions on …, 2023 - ieeexplore.ieee.org
Automatic speaker verification (ASV) exhibits unsatisfactory performance under domain
mismatch conditions owing to intrinsic and extrinsic factors, such as variations in speaking …

Combination of deep speaker embeddings for diarisation

G Sun, C Zhang, PC Woodland - Neural Networks, 2021 - Elsevier
Significant progress has recently been made in speaker diarisation after the introduction of d-
vectors as speaker embeddings extracted from neural network (NN) speaker classifiers for …

Optimized speaker change detection approach for speaker segmentation towards speaker diarization based on deep learning

K VijayKumar - Data & Knowledge Engineering, 2023 - Elsevier
Speaker diarization is the partitioning of an audio source stream into homogeneous
segments according to the speaker's identity. It can improve the readability of an automatic …

U-vectors: Generating clusterable speaker embedding from unlabeled data

MF Mridha, AQ Ohi, MM Monowar, MA Hamid… - Applied Sciences, 2021 - mdpi.com
Speaker recognition deals with recognizing speakers by their speech. Most speaker
recognition systems are built upon two stages, the first stage extracts low dimensional …