RawNeXt: Speaker verification system for variable-duration utterances with deep layer aggregation...

Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation

H Tak, M Todisco, X Wang, J Jung, J Yamagishi… - arXiv preprint arXiv …, 2022 - arxiv.org

The performance of spoofing countermeasure systems depends fundamentally upon the use
of sufficiently representative training data. With this usually being limited, current solutions …

被引用次数：145 相关文章所有 9 个版本

Speech and speaker recognition using raw waveform modeling for adult and children's speech: A comprehensive review

K Radha, M Bansal, RB Pachori - Engineering Applications of Artificial …, 2024 - Elsevier

Conventionally, the extraction of hand-crafted acoustic features has been separated from the
task of establishing robust machine-learning models in speech processing. The manual …

被引用次数：6 相关文章

[PDF] arxiv.org

Pushing the limits of raw waveform speaker recognition

J Jung, YJ Kim, HS Heo, BJ Lee, Y Kwon… - arXiv preprint arXiv …, 2022 - arxiv.org

In recent years, speaker recognition systems based on raw waveform inputs have received
increasing attention. However, the performance of such systems are typically inferior to the …

被引用次数：92 相关文章所有 11 个版本

[PDF] arxiv.org

Frequency and multi-scale selective kernel attention for speaker verification

SH Mun, J Jung, MH Han… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org

The majority of recent state-of-the-art speaker verification architectures adopt multi-scale
processing and frequency-channel attention mechanisms. Convolutional layers of these …

被引用次数：22 相关文章所有 5 个版本

Short-segment speaker verification using ecapa-tdnn with multi-resolution encoder

S Han, Y Ahn, K Kang, JW Shin - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org

Time-domain approaches have shown the potential to improve the performance of speaker
verification, but still predominant approaches utilize hand-crafted features such as the mel …

被引用次数：10 相关文章

[PDF] arxiv.org

RSKNet-MTSP: Effective and portable deep architecture for speaker verification

Y Wu, C Guo, J Zhao, X Jin, J Xu - Neurocomputing, 2022 - Elsevier

The convolutional neural network (CNN) based approaches have shown great success for
speaker verification (SV) tasks, where modeling long temporal context and reducing …

被引用次数：11 相关文章所有 4 个版本

Fisher ratio-based multi-domain frame-level feature aggregation for short utterance speaker verification

Y Zi, S Xiong - Engineering Applications of Artificial Intelligence, 2024 - Elsevier

As the durations of the short utterances are small, it is difficult to learn sufficient information
to distinguish the person, thus, short utterance speaker recognition is highly challenging. In …

Multi-level attention network: Mixed time–frequency channel attention and multi-scale self-attentive standard deviation pooling for speaker recognition

L Deng, F Deng, K Zhou, P Jiang, G Zhang… - … Applications of Artificial …, 2024 - Elsevier

In this paper, we propose a more efficient lightweight speaker recognition network, the multi-
level attention network (MANet). MANet aims to generate more robust and discriminative …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Voiceextender: Short-Utterance Text-Independent Speaker Verification With Guided Diffusion Model

Y He, Z Kang, J Wang, J Peng… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

Speaker verification (SV) performance deteriorates as utterances become shorter. To this
end, we propose a new architecture called VoiceExtender which provides a promising …

被引用次数：1 相关文章所有 4 个版本

End-to-end deep speaker embedding learning using multi-scale attentional fusion and graph neural networks

HB Kashani, S Jazmi - Expert Systems with Applications, 2023 - Elsevier

As an attractive research in biometric authentication, Text Independent Speaker Verification
(TI-SV) problem aims to specify whether two given unconstrained utterances come from the …

被引用次数：3 相关文章