Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation

H Tak, M Todisco, X Wang, J Jung, J Yamagishi… - arXiv preprint arXiv …, 2022 - arxiv.org
The performance of spoofing countermeasure systems depends fundamentally upon the use
of sufficiently representative training data. With this usually being limited, current solutions …

Speech and speaker recognition using raw waveform modeling for adult and children's speech: A comprehensive review

K Radha, M Bansal, RB Pachori - Engineering Applications of Artificial …, 2024 - Elsevier
Conventionally, the extraction of hand-crafted acoustic features has been separated from the
task of establishing robust machine-learning models in speech processing. The manual …

Pushing the limits of raw waveform speaker recognition

J Jung, YJ Kim, HS Heo, BJ Lee, Y Kwon… - arXiv preprint arXiv …, 2022 - arxiv.org
In recent years, speaker recognition systems based on raw waveform inputs have received
increasing attention. However, the performance of such systems are typically inferior to the …

Frequency and multi-scale selective kernel attention for speaker verification

SH Mun, J Jung, MH Han… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
The majority of recent state-of-the-art speaker verification architectures adopt multi-scale
processing and frequency-channel attention mechanisms. Convolutional layers of these …

Short-segment speaker verification using ecapa-tdnn with multi-resolution encoder

S Han, Y Ahn, K Kang, JW Shin - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Time-domain approaches have shown the potential to improve the performance of speaker
verification, but still predominant approaches utilize hand-crafted features such as the mel …

RSKNet-MTSP: Effective and portable deep architecture for speaker verification

Y Wu, C Guo, J Zhao, X Jin, J Xu - Neurocomputing, 2022 - Elsevier
The convolutional neural network (CNN) based approaches have shown great success for
speaker verification (SV) tasks, where modeling long temporal context and reducing …

Fisher ratio-based multi-domain frame-level feature aggregation for short utterance speaker verification

Y Zi, S Xiong - Engineering Applications of Artificial Intelligence, 2024 - Elsevier
As the durations of the short utterances are small, it is difficult to learn sufficient information
to distinguish the person, thus, short utterance speaker recognition is highly challenging. In …

Multi-level attention network: Mixed time–frequency channel attention and multi-scale self-attentive standard deviation pooling for speaker recognition

L Deng, F Deng, K Zhou, P Jiang, G Zhang… - … Applications of Artificial …, 2024 - Elsevier
In this paper, we propose a more efficient lightweight speaker recognition network, the multi-
level attention network (MANet). MANet aims to generate more robust and discriminative …

Voiceextender: Short-Utterance Text-Independent Speaker Verification With Guided Diffusion Model

Y He, Z Kang, J Wang, J Peng… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Speaker verification (SV) performance deteriorates as utterances become shorter. To this
end, we propose a new architecture called VoiceExtender which provides a promising …

End-to-end deep speaker embedding learning using multi-scale attentional fusion and graph neural networks

HB Kashani, S Jazmi - Expert Systems with Applications, 2023 - Elsevier
As an attractive research in biometric authentication, Text Independent Speaker Verification
(TI-SV) problem aims to specify whether two given unconstrained utterances come from the …