Speaker embedding extraction with phonetic information
Speaker embeddings achieve promising results on many speaker verification tasks.
Phonetic information, as an important component of speech, is rarely considered in the …
Phonetic information, as an important component of speech, is rarely considered in the …
Speech emotion recognition based on genetic algorithm–decision tree fusion of deep and acoustic features
L Sun, Q Li, S Fu, P Li - ETRI Journal, 2022 - Wiley Online Library
Although researchers have proposed numerous techniques for speech emotion recognition,
its performance remains unsatisfactory in many application scenarios. In this study, we …
its performance remains unsatisfactory in many application scenarios. In this study, we …
Self-supervised speaker embeddings
Contrary to i-vectors, speaker embeddings such as x-vectors are incapable of leveraging
unlabelled utterances, due to the classification loss over training speakers. In this paper, we …
unlabelled utterances, due to the classification loss over training speakers. In this paper, we …
Deep classification of sound: A concise review
Sound classification is a task to classify sounds into different classes. In earlier days, the
researchers mainly used conventional machine learning techniques to classify sounds. In …
researchers mainly used conventional machine learning techniques to classify sounds. In …
Adversarially learning disentangled speech representations for robust multi-factor voice conversion
Factorizing speech as disentangled speech representations is vital to achieve highly
controllable style transfer in voice conversion (VC). Conventional speech representation …
controllable style transfer in voice conversion (VC). Conventional speech representation …
Deep normalization for speaker vectors
Deep speaker embedding has demonstrated state-of-the-art performance in speaker
recognition tasks. However, one potential issue with this approach is that the speaker …
recognition tasks. However, one potential issue with this approach is that the speaker …
Phoneme-unit-specific time-delay neural network for speaker verification
X Chen, C Bao - IEEE/ACM Transactions on Audio, Speech …, 2021 - ieeexplore.ieee.org
Variations of speech content increase the difficulty of speaker verification. In this paper, to
alleviate the negative effect of the variations, phoneme-unit-specific time-delay neural …
alleviate the negative effect of the variations, phoneme-unit-specific time-delay neural …
Noise-robust voice conversion with domain adversarial training
Voice conversion has made great progress in the past few years under the studio-quality test
scenario in terms of speech quality and speaker similarity. However, in real applications, test …
scenario in terms of speech quality and speaker similarity. However, in real applications, test …
Decomposition and reorganization of phonetic information for speaker embedding learning
Speech content is closely related to the stability of speaker embeddings in speaker
verification tasks. In this paper, we propose a novel architecture based on self-constraint …
verification tasks. In this paper, we propose a novel architecture based on self-constraint …
Random cycle loss and its application to voice conversion
Speech disentanglement aims to decompose independent causal factors of speech signals
into separate codes. Perfect disentanglement benefits to a broad range of speech …
into separate codes. Perfect disentanglement benefits to a broad range of speech …