Speaker recognition based on deep learning: An overview

Z Bai, XL Zhang - Neural Networks, 2021 - Elsevier
Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …

Deep representation learning in speech processing: Challenges, recent advances, and future trends

S Latif, R Rana, S Khalifa, R Jurdak, J Qadir… - arXiv preprint arXiv …, 2020 - arxiv.org
Research on speech processing has traditionally considered the task of designing hand-
engineered acoustic features (feature engineering) as a separate distinct problem from the …

Two-stream collaborative learning with spatial-temporal attention for video classification

Y Peng, Y Zhao, J Zhang - … on Circuits and Systems for Video …, 2018 - ieeexplore.ieee.org
Video classification is highly important and has widespread applications, such as video
search and intelligent surveillance. Video naturally contains both static and motion …

Leveraging asr pretrained conformers for speaker verification through transfer learning and knowledge distillation

D Cai, M Li - IEEE/ACM Transactions on Audio, Speech, and …, 2024 - ieeexplore.ieee.org
This paper focuses on the application of Conformers in speaker verification. Conformers,
initially designed for Automatic Speech Recognition (ASR), excel at modeling both local and …

GPRI2Net: A deep-neural-network-based ground penetrating radar data inversion and object identification framework for consecutive and long survey lines

J Wang, H Liu, P Jiang, Z Wang, Q Sui… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Ground penetrating radar (GPR) enables infrastructure inspection using consecutive and
long survey lines. However, the existing GPR data processing methods may lead to …

Speaker embedding extraction with phonetic information

Y Liu, L He, J Liu, MT Johnson - arXiv preprint arXiv:1804.04862, 2018 - arxiv.org
Speaker embeddings achieve promising results on many speaker verification tasks.
Phonetic information, as an important component of speech, is rarely considered in the …

A semantic-aware strategy for automatic speech recognition incorporating deep learning models

A Santhanavijayan, D Naresh Kumar… - Intelligent System Design …, 2021 - Springer
Abstract Automatic Speech Recognition (ASR) is trending in the age of the Internet of Things
and Machine Intelligence. It plays a pivotal role in several applications. Conventional …

Decoupling and Interacting Multi-Task Learning Network for Joint Speech and Accent Recognition

Q Shao, P Guo, J Yan, P Hu… - IEEE/ACM Transactions on …, 2023 - ieeexplore.ieee.org
Accents pose significant challenges for speech recognition systems. Although joint
automatic speech recognition (ASR) and accent recognition (AR) training has been proven …

Multi-task twin bounded support vector machine and its safe screening rule

R An, Y Xu, X Liu - Applied Soft Computing, 2023 - Elsevier
Direct multi-task twin support vector machine (DMTSVM) obtains great performance in
dealing with correlated tasks. However, DMTSVM only considers the empirical risk …

Phoneme-unit-specific time-delay neural network for speaker verification

X Chen, C Bao - IEEE/ACM Transactions on Audio, Speech …, 2021 - ieeexplore.ieee.org
Variations of speech content increase the difficulty of speaker verification. In this paper, to
alleviate the negative effect of the variations, phoneme-unit-specific time-delay neural …