Collaborative joint training with multitask recurrent model for speech and speaker recognition

Z Bai, XL Zhang - Neural Networks, 2021 - Elsevier

Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …

被引用次数：390 相关文章所有 9 个版本

[PDF] arxiv.org

Deep representation learning in speech processing: Challenges, recent advances, and future trends

S Latif, R Rana, S Khalifa, R Jurdak, J Qadir… - arXiv preprint arXiv …, 2020 - arxiv.org

Research on speech processing has traditionally considered the task of designing hand-
engineered acoustic features (feature engineering) as a separate distinct problem from the …

被引用次数：104 相关文章所有 3 个版本

[PDF] arxiv.org

Two-stream collaborative learning with spatial-temporal attention for video classification

Y Peng, Y Zhao, J Zhang - … on Circuits and Systems for Video …, 2018 - ieeexplore.ieee.org

Video classification is highly important and has widespread applications, such as video
search and intelligent surveillance. Video naturally contains both static and motion …

被引用次数：131 相关文章所有 4 个版本

Leveraging asr pretrained conformers for speaker verification through transfer learning and knowledge distillation

D Cai, M Li - IEEE/ACM Transactions on Audio, Speech, and …, 2024 - ieeexplore.ieee.org

This paper focuses on the application of Conformers in speaker verification. Conformers,
initially designed for Automatic Speech Recognition (ASR), excel at modeling both local and …

被引用次数：9 相关文章所有 2 个版本

GPRI2Net: A deep-neural-network-based ground penetrating radar data inversion and object identification framework for consecutive and long survey lines

J Wang, H Liu, P Jiang, Z Wang, Q Sui… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

Ground penetrating radar (GPR) enables infrastructure inspection using consecutive and
long survey lines. However, the existing GPR data processing methods may lead to …

被引用次数：35 相关文章所有 2 个版本

[PDF] arxiv.org

Speaker embedding extraction with phonetic information

Y Liu, L He, J Liu, MT Johnson - arXiv preprint arXiv:1804.04862, 2018 - arxiv.org

Speaker embeddings achieve promising results on many speaker verification tasks.
Phonetic information, as an important component of speech, is rarely considered in the …

被引用次数：82 相关文章所有 8 个版本

[PDF] researchgate.net

A semantic-aware strategy for automatic speech recognition incorporating deep learning models

A Santhanavijayan, D Naresh Kumar… - Intelligent System Design …, 2021 - Springer

Abstract Automatic Speech Recognition (ASR) is trending in the age of the Internet of Things
and Machine Intelligence. It plays a pivotal role in several applications. Conventional …

被引用次数：48 相关文章所有 3 个版本

[PDF] arxiv.org

Decoupling and Interacting Multi-Task Learning Network for Joint Speech and Accent Recognition

Q Shao, P Guo, J Yan, P Hu… - IEEE/ACM Transactions on …, 2023 - ieeexplore.ieee.org

Accents pose significant challenges for speech recognition systems. Although joint
automatic speech recognition (ASR) and accent recognition (AR) training has been proven …

被引用次数：4 相关文章所有 4 个版本

Multi-task twin bounded support vector machine and its safe screening rule

R An, Y Xu, X Liu - Applied Soft Computing, 2023 - Elsevier

Direct multi-task twin support vector machine (DMTSVM) obtains great performance in
dealing with correlated tasks. However, DMTSVM only considers the empirical risk …

被引用次数：6 相关文章所有 2 个版本

Phoneme-unit-specific time-delay neural network for speaker verification

X Chen, C Bao - IEEE/ACM Transactions on Audio, Speech …, 2021 - ieeexplore.ieee.org

Variations of speech content increase the difficulty of speaker verification. In this paper, to
alleviate the negative effect of the variations, phoneme-unit-specific time-delay neural …

被引用次数：23 相关文章所有 2 个版本