Speaker embedding extraction with phonetic information

Z Bai, XL Zhang - Neural Networks, 2021 - Elsevier

Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …

被引用次数：419 相关文章所有 9 个版本

[PDF] neurips.cc

Disentangling voice and content with self-supervision for speaker recognition

T Liu, KA Lee, Q Wang, H Li - Advances in Neural …, 2023 - proceedings.neurips.cc

For speaker recognition, it is difficult to extract an accurate speaker representation from
speech because of its mixture of speaker traits and content. This paper proposes a …

被引用次数：26 相关文章所有 9 个版本

[PDF] duke.edu

On-the-fly data loader and utterance-level aggregation for speaker and language recognition

W Cai, J Chen, J Zhang, M Li - IEEE/ACM Transactions on …, 2020 - ieeexplore.ieee.org

In this article, our recent efforts on directly modeling utterance-level aggregation for speaker
and language recognition is summarized. First, an on-the-fly data loader for efficient network …

被引用次数：96 相关文章所有 3 个版本

[PDF] arxiv.org

Leveraging asr pretrained conformers for speaker verification through transfer learning and knowledge distillation

D Cai, M Li - IEEE/ACM Transactions on Audio, Speech, and …, 2024 - ieeexplore.ieee.org

This paper focuses on the application of Conformers in speaker verification. Conformers,
initially designed for Automatic Speech Recognition (ASR), excel at modeling both local and …

被引用次数：12 相关文章所有 2 个版本

[PDF] archive.org

CNN with phonetic attention for text-independent speaker verification

T Zhou, Y Zhao, J Li, Y Gong… - 2019 IEEE Automatic …, 2019 - ieeexplore.ieee.org

Text-independent speaker verification imposes no constraints on the spoken content and
usually needs long observations to make reliable prediction. In this paper, we propose two …

被引用次数：70 相关文章所有 4 个版本

Multi-resolution multi-head attention in deep speaker embedding

Z Wang, K Yao, X Li, S Fang - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org

Pooling is an essential component to capture long-term speaker characteristics for speaker
recognition. This paper proposes simple but effective pooling methods to compute attentive …

被引用次数：52 相关文章

MEConformer: Highly representative embedding extractor for speaker verification via incorporating selective convolution into deep speaker encoder

Q Zheng, Z Chen, Z Wang, H Liu, M Lin - Expert Systems with Applications, 2024 - Elsevier

Transformer models have demonstrated superior performance across various domains,
including computer vision, natural language processing, and speech recognition. The …

被引用次数：11 相关文章

[PDF] acm.org

Svoice: Enabling voice communication in silence via acoustic sensing on commodity devices

Y Fu, S Wang, L Zhong, L Chen, J Ren… - Proceedings of the 20th …, 2022 - dl.acm.org

Silent Speech Interface (SSI) has been proposed as a means of reconstructing audible
speech from silent articulatory gestures for covert voice communication in public and voice …

被引用次数：19 相关文章

[PDF] arxiv.org

Improving continuous sign language recognition with consistency constraints and signer removal

R Zuo, B Mak - ACM Transactions on Multimedia Computing …, 2024 - dl.acm.org

Deep-learning-based continuous sign language recognition (CSLR) models typically consist
of a visual module, a sequential module, and an alignment module. However, the …

被引用次数：13 相关文章所有 3 个版本

Ultrasr: Silent speech reconstruction via acoustic sensing

Y Fu, S Wang, L Zhong, L Chen, J Ren… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Silent Speech Interfaces (SSI) have been developed to convert silent articulatory gestures
into speech, aiding communication in public spaces and assisting individuals with aphasia …

被引用次数：2 相关文章所有 3 个版本