A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Audio Anti-Spoofing Detection: A Survey

M Li, Y Ahmadiadli, XP Zhang - arXiv preprint arXiv:2404.13914, 2024 - arxiv.org
The availability of smart devices leads to an exponential increase in multimedia content.
However, the rapid advancements in deep learning have given rise to sophisticated …

Graph attention-based deep embedded clustering for speaker diarization

Y Wei, H Guo, Z Ge, Z Yang - Speech Communication, 2023 - Elsevier
Deep speaker embedding extraction models have recently served as the cornerstone for
modular speaker diarization systems. However, in current modular systems, the extracted …

Speaker verification using attentive multi-scale convolutional recurrent network

Y Li, Z Jiang, W Cao, Q Huang - Applied Soft Computing, 2022 - Elsevier
In this paper, we propose a speaker verification method by an Attentive Multi-scale
Convolutional Recurrent Network (AMCRN). The proposed AMCRN can acquire both local …

[HTML][HTML] Class token and knowledge distillation for multi-head self-attention speaker verification systems

V Mingote, A Miguel, A Ortega, E Lleida - Digital Signal Processing, 2023 - Elsevier
This paper explores three novel approaches to improve the performance of speaker
verification (SV) systems based on deep neural networks (DNN) using Multi-head Self …

跨域注意力特征融合的说话人确认方法

杨震, 王天朗, 郭海燕, 王婷婷 - 通信学报, 2023 - infocomm-journal.com
针对目前说话人确认系统中前端特征的语音信号样点间结构信息缺失问题,
提出了跨域注意力特征融合的说话人确认方法. 首先, 提出了一种基于图信号处理的图频域特征 …

End-to-end deep speaker embedding learning using multi-scale attentional fusion and graph neural networks

HB Kashani, S Jazmi - Expert Systems with Applications, 2023 - Elsevier
As an attractive research in biometric authentication, Text Independent Speaker Verification
(TI-SV) problem aims to specify whether two given unconstrained utterances come from the …

One-Step Knowledge Distillation and Fine-Tuning in Using Large Pre-Trained Self-Supervised Learning Models for Speaker Verification

J Heo, C Lim, J Kim, H Shin, HJ Yu - arXiv preprint arXiv:2305.17394, 2023 - arxiv.org
The application of speech self-supervised learning (SSL) models has achieved remarkable
performance in speaker verification (SV). However, there is a computational cost hurdle in …

Distance Metric-Based Open-Set Domain Adaptation for Speaker Verification

J Li, J Han, F Qian, T Zheng, Y He… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Domain shift poses a significant challenge in speaker verification, especially in open-set
scenarios where the speaker categories are disjoint between the source and target …

Speaker recognition using isomorphic graph attention network based pooling on self-supervised representation

Z Ge, X Xu, H Guo, T Wang, Z Yang - Applied Acoustics, 2024 - Elsevier
The emergence of self-supervised representation (ie, wav2vec 2.0) allows speaker-
recognition approaches to process spoken signals through foundation models built on …