JSUT and JVS: Free Japanese voice corpora for accelerating speech synthesis research

S Takamichi, R Sonobe, K Mitsui, Y Saito… - Acoustical Science …, 2020 - jstage.jst.go.jp
In this paper, we develop two corpora for speech synthesis research. Thanks to
improvements in machine learning techniques, including deep learning, speech synthesis is …

Nvc-net: End-to-end adversarial voice conversion

B Nguyen, F Cardinaux - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Voice conversion (VC) has gained increasing popularity in many speech synthesis
applications. The idea is to change the voice identity from one speaker into another while …

JVS corpus: free Japanese multi-speaker voice corpus

S Takamichi, K Mitsui, Y Saito, T Koriyama… - arXiv preprint arXiv …, 2019 - arxiv.org
Thanks to improvements in machine learning techniques, including deep learning, speech
synthesis is becoming a machine learning task. To accelerate speech synthesis research …

JVS-MuSiC: Japanese multispeaker singing-voice corpus

H Tamaru, S Takamichi, N Tanji… - arXiv preprint arXiv …, 2020 - arxiv.org
Thanks to developments in machine learning techniques, it has become possible to
synthesize high-quality singing voices of a single singer. An open multispeaker singing …

[PDF][PDF] Cross-Lingual Text-To-Speech Synthesis via Domain Adaptation and Perceptual Similarity Regression in Speaker Space.

D Xin, Y Saito, S Takamichi, T Koriyama… - …, 2020 - interspeech2020.org
We present a method for improving the performance of crosslingual text-to-speech
synthesis. Previous works are able to model speaker individuality in speaker space via …

Perceptual-similarity-aware deep speaker representation learning for multi-speaker generative modeling

Y Saito, S Takamichi… - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org
We propose novel deep speaker representation learning that considers perceptual similarity
among speakers for multi-speaker generative modeling. Following its success in accurate …

Deep Gaussian process based multi-speaker speech synthesis with latent speaker representation

K Mitsui, T Koriyama, H Saruwatari - Speech Communication, 2021 - Elsevier
This paper proposes deep Gaussian process (DGP)-based frameworks for multi-speaker
speech synthesis and speaker representation learning. A DGP has a deep architecture of …

Group‐level brain decoding with deep learning

R Csaky, MWJ Van Es, OP Jones… - Human Brain …, 2023 - Wiley Online Library
Decoding brain imaging data are gaining popularity, with applications in brain‐computer
interfaces and the study of neural representations. Decoding is typically subject‐specific and …

Multi-speaker text-to-speech synthesis using deep Gaussian processes

K Mitsui, T Koriyama, H Saruwatari - arXiv preprint arXiv:2008.02950, 2020 - arxiv.org
Multi-speaker speech synthesis is a technique for modeling multiple speakers' voices with a
single model. Although many approaches using deep neural networks (DNNs) have been …

Perceptual Analysis of Speaker Embeddings for Voice Discrimination between Machine And Human Listening

I Thoidis, C Gaultier, T Goehring - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
This study investigates the information captured by speaker embeddings with relevance to
human speech perception. A Convolutional Neural Network was trained to perform one-shot …