Deep learning for text style transfer: A survey

D Jin, Z Jin, Z Hu, O Vechtomova… - Computational …, 2022 - direct.mit.edu
Text style transfer is an important task in natural language generation, which aims to control
certain attributes in the generated text, such as politeness, emotion, humor, and many …

Vqmivc: Vector quantization and mutual information-based unsupervised speech representation disentanglement for one-shot voice conversion

D Wang, L Deng, YT Yeung, X Chen, X Liu… - arXiv preprint arXiv …, 2021 - arxiv.org
One-shot voice conversion (VC), which performs conversion across arbitrary speakers with
only a single target-speaker utterance for reference, can be effectively achieved by speech …

Interpretability for reliable, efficient, and self-cognitive DNNs: From theories to applications

X Kang, J Guo, B Song, B Cai, H Sun, Z Zhang - Neurocomputing, 2023 - Elsevier
In recent years, remarkable achievements have been made in artificial intelligence tasks
and applications based on deep neural networks (DNNs), especially in the fields of vision …

Voicemixer: Adversarial voice style mixup

SH Lee, JH Kim, H Chung… - Advances in Neural …, 2021 - proceedings.neurips.cc
Although recent advances in voice conversion have shown significant improvement, there
still remains a gap between the converted voice and target voice. A key factor that maintains …

Semantic feature extraction for generalized zero-shot learning

J Kim, K Shim, B Shim - Proceedings of the AAAI conference on artificial …, 2022 - ojs.aaai.org
Generalized zero-shot learning (GZSL) is a technique to train a deep learning model to
identify unseen classes using the attribute. In this paper, we put forth a new GZSL technique …

Drvc: A framework of any-to-any voice conversion with self-supervised learning

Q Wang, X Zhang, J Wang, N Cheng… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Any-to-any voice conversion problem aims to convert voices for source and target speakers,
which are out of the training data. Previous works wildly utilize the disentangle-based …

Duration controllable voice conversion via phoneme-based information bottleneck

SH Lee, HR Noh, WJ Nam… - IEEE/ACM Transactions on …, 2022 - ieeexplore.ieee.org
Several voice conversion (VC) methods using a simple autoencoder with a carefully
designed information bottleneck have recently been studied. In general, they extract content …

Zero-shot voice conditioning for denoising diffusion tts models

A Levkovitch, E Nachmani, L Wolf - arXiv preprint arXiv:2206.02246, 2022 - arxiv.org
We present a novel way of conditioning a pretrained denoising diffusion speech model to
produce speech in the voice of a novel person unseen during training. The method requires …

Stylized data-to-text generation: A case study in the e-commerce domain

L Jing, X Song, X Lin, Z Zhao, W Zhou… - ACM Transactions on …, 2023 - dl.acm.org
Existing data-to-text generation efforts mainly focus on generating a coherent text from non-
linguistic input data, such as tables and attribute–value pairs, but overlook that different …

Adversarially learning disentangled speech representations for robust multi-factor voice conversion

J Wang, J Li, X Zhao, Z Wu, S Kang, H Meng - arXiv preprint arXiv …, 2021 - arxiv.org
Factorizing speech as disentangled speech representations is vital to achieve highly
controllable style transfer in voice conversion (VC). Conventional speech representation …