Speaker embedding extraction with phonetic information

Y Liu, L He, J Liu, MT Johnson - arXiv preprint arXiv:1804.04862, 2018 - arxiv.org
Speaker embeddings achieve promising results on many speaker verification tasks.
Phonetic information, as an important component of speech, is rarely considered in the …

Speech emotion recognition based on genetic algorithm–decision tree fusion of deep and acoustic features

L Sun, Q Li, S Fu, P Li - ETRI Journal, 2022 - Wiley Online Library
Although researchers have proposed numerous techniques for speech emotion recognition,
its performance remains unsatisfactory in many application scenarios. In this study, we …

Self-supervised speaker embeddings

T Stafylakis, J Rohdin, O Plchot, P Mizera… - arXiv preprint arXiv …, 2019 - arxiv.org
Contrary to i-vectors, speaker embeddings such as x-vectors are incapable of leveraging
unlabelled utterances, due to the classification loss over training speakers. In this paper, we …

Deep classification of sound: A concise review

S Bhattacharya, N Das, S Sahu, A Mondal… - Proceeding of First …, 2021 - Springer
Sound classification is a task to classify sounds into different classes. In earlier days, the
researchers mainly used conventional machine learning techniques to classify sounds. In …

Adversarially learning disentangled speech representations for robust multi-factor voice conversion

J Wang, J Li, X Zhao, Z Wu, S Kang, H Meng - arXiv preprint arXiv …, 2021 - arxiv.org
Factorizing speech as disentangled speech representations is vital to achieve highly
controllable style transfer in voice conversion (VC). Conventional speech representation …

Deep normalization for speaker vectors

Y Cai, L Li, A Abel, X Zhu… - IEEE/ACM Transactions on …, 2020 - ieeexplore.ieee.org
Deep speaker embedding has demonstrated state-of-the-art performance in speaker
recognition tasks. However, one potential issue with this approach is that the speaker …

Phoneme-unit-specific time-delay neural network for speaker verification

X Chen, C Bao - IEEE/ACM Transactions on Audio, Speech …, 2021 - ieeexplore.ieee.org
Variations of speech content increase the difficulty of speaker verification. In this paper, to
alleviate the negative effect of the variations, phoneme-unit-specific time-delay neural …

Noise-robust voice conversion with domain adversarial training

H Du, L Xie, H Li - Neural Networks, 2022 - Elsevier
Voice conversion has made great progress in the past few years under the studio-quality test
scenario in terms of speech quality and speaker similarity. However, in real applications, test …

Decomposition and reorganization of phonetic information for speaker embedding learning

QB Hong, CH Wu, HM Wang - IEEE/ACM Transactions on …, 2023 - ieeexplore.ieee.org
Speech content is closely related to the stability of speaker embeddings in speaker
verification tasks. In this paper, we propose a novel architecture based on self-constraint …

Random cycle loss and its application to voice conversion

H Sun, D Wang, L Li, C Chen… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Speech disentanglement aims to decompose independent causal factors of speech signals
into separate codes. Perfect disentanglement benefits to a broad range of speech …