Deep learning for text style transfer: A survey
Text style transfer is an important task in natural language generation, which aims to control
certain attributes in the generated text, such as politeness, emotion, humor, and many …
certain attributes in the generated text, such as politeness, emotion, humor, and many …
Vqmivc: Vector quantization and mutual information-based unsupervised speech representation disentanglement for one-shot voice conversion
One-shot voice conversion (VC), which performs conversion across arbitrary speakers with
only a single target-speaker utterance for reference, can be effectively achieved by speech …
only a single target-speaker utterance for reference, can be effectively achieved by speech …
Interpretability for reliable, efficient, and self-cognitive DNNs: From theories to applications
In recent years, remarkable achievements have been made in artificial intelligence tasks
and applications based on deep neural networks (DNNs), especially in the fields of vision …
and applications based on deep neural networks (DNNs), especially in the fields of vision …
Voicemixer: Adversarial voice style mixup
Although recent advances in voice conversion have shown significant improvement, there
still remains a gap between the converted voice and target voice. A key factor that maintains …
still remains a gap between the converted voice and target voice. A key factor that maintains …
Semantic feature extraction for generalized zero-shot learning
Generalized zero-shot learning (GZSL) is a technique to train a deep learning model to
identify unseen classes using the attribute. In this paper, we put forth a new GZSL technique …
identify unseen classes using the attribute. In this paper, we put forth a new GZSL technique …
Drvc: A framework of any-to-any voice conversion with self-supervised learning
Any-to-any voice conversion problem aims to convert voices for source and target speakers,
which are out of the training data. Previous works wildly utilize the disentangle-based …
which are out of the training data. Previous works wildly utilize the disentangle-based …
Duration controllable voice conversion via phoneme-based information bottleneck
Several voice conversion (VC) methods using a simple autoencoder with a carefully
designed information bottleneck have recently been studied. In general, they extract content …
designed information bottleneck have recently been studied. In general, they extract content …
Zero-shot voice conditioning for denoising diffusion tts models
We present a novel way of conditioning a pretrained denoising diffusion speech model to
produce speech in the voice of a novel person unseen during training. The method requires …
produce speech in the voice of a novel person unseen during training. The method requires …
Stylized data-to-text generation: A case study in the e-commerce domain
Existing data-to-text generation efforts mainly focus on generating a coherent text from non-
linguistic input data, such as tables and attribute–value pairs, but overlook that different …
linguistic input data, such as tables and attribute–value pairs, but overlook that different …
Adversarially learning disentangled speech representations for robust multi-factor voice conversion
Factorizing speech as disentangled speech representations is vital to achieve highly
controllable style transfer in voice conversion (VC). Conventional speech representation …
controllable style transfer in voice conversion (VC). Conventional speech representation …