Deep factorization for speech signal

Y Liu, L He, J Liu, MT Johnson - arXiv preprint arXiv:1804.04862, 2018 - arxiv.org

Speaker embeddings achieve promising results on many speaker verification tasks.
Phonetic information, as an important component of speech, is rarely considered in the …

被引用次数：82 相关文章所有 8 个版本

[PDF] wiley.com Full View

Speech emotion recognition based on genetic algorithm–decision tree fusion of deep and acoustic features

L Sun, Q Li, S Fu, P Li - ETRI Journal, 2022 - Wiley Online Library

Although researchers have proposed numerous techniques for speech emotion recognition,
its performance remains unsatisfactory in many application scenarios. In this study, we …

被引用次数：24 相关文章所有 4 个版本

[PDF] arxiv.org

Self-supervised speaker embeddings

T Stafylakis, J Rohdin, O Plchot, P Mizera… - arXiv preprint arXiv …, 2019 - arxiv.org

Contrary to i-vectors, speaker embeddings such as x-vectors are incapable of leveraging
unlabelled utterances, due to the classification loss over training speakers. In this paper, we …

被引用次数：55 相关文章所有 11 个版本

[PDF] academia.edu

Deep classification of sound: A concise review

S Bhattacharya, N Das, S Sahu, A Mondal… - Proceeding of First …, 2021 - Springer

Sound classification is a task to classify sounds into different classes. In earlier days, the
researchers mainly used conventional machine learning techniques to classify sounds. In …

被引用次数：12 相关文章所有 4 个版本

[PDF] arxiv.org

Adversarially learning disentangled speech representations for robust multi-factor voice conversion

J Wang, J Li, X Zhao, Z Wu, S Kang, H Meng - arXiv preprint arXiv …, 2021 - arxiv.org

Factorizing speech as disentangled speech representations is vital to achieve highly
controllable style transfer in voice conversion (VC). Conventional speech representation …

被引用次数：28 相关文章所有 5 个版本

[PDF] arxiv.org

Deep normalization for speaker vectors

Y Cai, L Li, A Abel, X Zhu… - IEEE/ACM Transactions on …, 2020 - ieeexplore.ieee.org

Deep speaker embedding has demonstrated state-of-the-art performance in speaker
recognition tasks. However, one potential issue with this approach is that the speaker …

被引用次数：31 相关文章所有 7 个版本

Phoneme-unit-specific time-delay neural network for speaker verification

X Chen, C Bao - IEEE/ACM Transactions on Audio, Speech …, 2021 - ieeexplore.ieee.org

Variations of speech content increase the difficulty of speaker verification. In this paper, to
alleviate the negative effect of the variations, phoneme-unit-specific time-delay neural …

被引用次数：23 相关文章所有 2 个版本

[PDF] arxiv.org

Noise-robust voice conversion with domain adversarial training

H Du, L Xie, H Li - Neural Networks, 2022 - Elsevier

Voice conversion has made great progress in the past few years under the studio-quality test
scenario in terms of speech quality and speaker similarity. However, in real applications, test …

被引用次数：10 相关文章所有 6 个版本

[PDF] sinica.edu.tw

Decomposition and reorganization of phonetic information for speaker embedding learning

QB Hong, CH Wu, HM Wang - IEEE/ACM Transactions on …, 2023 - ieeexplore.ieee.org

Speech content is closely related to the stability of speaker embeddings in speaker
verification tasks. In this paper, we propose a novel architecture based on self-constraint …

被引用次数：8 相关文章所有 6 个版本

[PDF] google.com

Random cycle loss and its application to voice conversion

H Sun, D Wang, L Li, C Chen… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Speech disentanglement aims to decompose independent causal factors of speech signals
into separate codes. Perfect disentanglement benefits to a broad range of speech …

被引用次数：5 相关文章所有 5 个版本