JSUT and JVS: Free Japanese voice corpora for accelerating speech synthesis research
In this paper, we develop two corpora for speech synthesis research. Thanks to
improvements in machine learning techniques, including deep learning, speech synthesis is …
improvements in machine learning techniques, including deep learning, speech synthesis is …
Nvc-net: End-to-end adversarial voice conversion
B Nguyen, F Cardinaux - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Voice conversion (VC) has gained increasing popularity in many speech synthesis
applications. The idea is to change the voice identity from one speaker into another while …
applications. The idea is to change the voice identity from one speaker into another while …
JVS corpus: free Japanese multi-speaker voice corpus
Thanks to improvements in machine learning techniques, including deep learning, speech
synthesis is becoming a machine learning task. To accelerate speech synthesis research …
synthesis is becoming a machine learning task. To accelerate speech synthesis research …
JVS-MuSiC: Japanese multispeaker singing-voice corpus
H Tamaru, S Takamichi, N Tanji… - arXiv preprint arXiv …, 2020 - arxiv.org
Thanks to developments in machine learning techniques, it has become possible to
synthesize high-quality singing voices of a single singer. An open multispeaker singing …
synthesize high-quality singing voices of a single singer. An open multispeaker singing …
[PDF][PDF] Cross-Lingual Text-To-Speech Synthesis via Domain Adaptation and Perceptual Similarity Regression in Speaker Space.
We present a method for improving the performance of crosslingual text-to-speech
synthesis. Previous works are able to model speaker individuality in speaker space via …
synthesis. Previous works are able to model speaker individuality in speaker space via …
Perceptual-similarity-aware deep speaker representation learning for multi-speaker generative modeling
Y Saito, S Takamichi… - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org
We propose novel deep speaker representation learning that considers perceptual similarity
among speakers for multi-speaker generative modeling. Following its success in accurate …
among speakers for multi-speaker generative modeling. Following its success in accurate …
Deep Gaussian process based multi-speaker speech synthesis with latent speaker representation
K Mitsui, T Koriyama, H Saruwatari - Speech Communication, 2021 - Elsevier
This paper proposes deep Gaussian process (DGP)-based frameworks for multi-speaker
speech synthesis and speaker representation learning. A DGP has a deep architecture of …
speech synthesis and speaker representation learning. A DGP has a deep architecture of …
Group‐level brain decoding with deep learning
Decoding brain imaging data are gaining popularity, with applications in brain‐computer
interfaces and the study of neural representations. Decoding is typically subject‐specific and …
interfaces and the study of neural representations. Decoding is typically subject‐specific and …
Multi-speaker text-to-speech synthesis using deep Gaussian processes
K Mitsui, T Koriyama, H Saruwatari - arXiv preprint arXiv:2008.02950, 2020 - arxiv.org
Multi-speaker speech synthesis is a technique for modeling multiple speakers' voices with a
single model. Although many approaches using deep neural networks (DNNs) have been …
single model. Although many approaches using deep neural networks (DNNs) have been …
Perceptual Analysis of Speaker Embeddings for Voice Discrimination between Machine And Human Listening
This study investigates the information captured by speaker embeddings with relevance to
human speech perception. A Convolutional Neural Network was trained to perform one-shot …
human speech perception. A Convolutional Neural Network was trained to perform one-shot …