Speaker recognition based on deep learning: An overview

Z Bai, XL Zhang - Neural Networks, 2021 - Elsevier
Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …

Mel frequency cepstral coefficient and its applications: A review

ZK Abdul, AK Al-Talabani - IEEE Access, 2022 - ieeexplore.ieee.org
Feature extraction and representation has significant impact on the performance of any
machine learning method. Mel Frequency Cepstrum Coefficient (MFCC) is designed to …

The INTERSPEECH 2020 far-field speaker verification challenge

X Qin, M Li, H Bu, W Rao, RK Das… - arXiv preprint arXiv …, 2020 - arxiv.org
The INTERSPEECH 2020 Far-Field Speaker Verification Challenge (FFSVC 2020)
addresses three different research problems under well-defined conditions: far-field text …

[PDF][PDF] Far-Field End-to-End Text-Dependent Speaker Verification Based on Mixed Training Data with Transfer Learning and Enrollment Data Augmentation.

X Qin, D Cai, M Li - Interspeech, 2019 - isca-archive.org
In this paper, we focus on the far-field end-to-end textdependent speaker verification task
with a small-scale far-field text dependent dataset and a large scale close-talking text …

Robust multi-channel far-field speaker verification under different in-domain data availability scenarios

X Qin, D Cai, M Li - IEEE/ACM Transactions on Audio, Speech …, 2022 - ieeexplore.ieee.org
The popularity and application of smart home devices have made far-field speaker
verification an urgent need. However, speaker verification performance is unsatisfactory …

The dku audio-visual wake word spotting system for the 2021 misp challenge

M Cheng, H Wang, Y Wang, M Li - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
This paper describes the system developed by the DKU team for the MISP Challenge 2021.
We present a two-stage approach consisting of end-to-end neural networks for the audio …

VE-KWS: Visual modality enhanced end-to-end keyword spotting

A Zhang, H Wang, P Guo, Y Fu, L Xie… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
The performance of the keyword spotting (KWS) system based on audio modality, commonly
measured in false alarms and false rejects, degrades significantly under the far field and …

Deep feature cyclegans: Speaker identity preserving non-parallel microphone-telephone domain adaptation for speaker verification

S Kataria, J Villalba, P Żelasko… - arXiv preprint arXiv …, 2021 - arxiv.org
With the increase in the availability of speech from varied domains, it is imperative to use
such out-of-domain data to improve existing speech systems. Domain adaptation is a …

Royalflush speaker diarization system for icassp 2022 multi-channel multi-party meeting transcription challenge

J Tian, X Hu, X Xu - arXiv preprint arXiv:2202.04814, 2022 - arxiv.org
This paper describes the Royalflush speaker diarization system submitted to the Multi-
channel Multi-party Meeting Transcription Challenge (M2MeT). Our system comprises …

Multisv: Dataset for far-field multi-channel speaker verification

L Mošner, O Plchot, L Burget… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Motivated by unconsolidated data situation and the lack of a standard benchmark in the
field, we complement our previous efforts and present a comprehensive corpus designed for …