DNN-based speaker embedding using subjective inter-speaker similarity for multi-speaker modeling...

S Takamichi, R Sonobe, K Mitsui, Y Saito… - Acoustical Science …, 2020 - jstage.jst.go.jp

In this paper, we develop two corpora for speech synthesis research. Thanks to
improvements in machine learning techniques, including deep learning, speech synthesis is …

被引用次数：61 相关文章所有 5 个版本

[PDF] arxiv.org

Nvc-net: End-to-end adversarial voice conversion

B Nguyen, F Cardinaux - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org

Voice conversion (VC) has gained increasing popularity in many speech synthesis
applications. The idea is to change the voice identity from one speaker into another while …

被引用次数：43 相关文章所有 4 个版本

[PDF] arxiv.org

JVS corpus: free Japanese multi-speaker voice corpus

S Takamichi, K Mitsui, Y Saito, T Koriyama… - arXiv preprint arXiv …, 2019 - arxiv.org

Thanks to improvements in machine learning techniques, including deep learning, speech
synthesis is becoming a machine learning task. To accelerate speech synthesis research …

被引用次数：73 相关文章所有 2 个版本

[PDF] arxiv.org

JVS-MuSiC: Japanese multispeaker singing-voice corpus

H Tamaru, S Takamichi, N Tanji… - arXiv preprint arXiv …, 2020 - arxiv.org

Thanks to developments in machine learning techniques, it has become possible to
synthesize high-quality singing voices of a single singer. An open multispeaker singing …

被引用次数：32 相关文章所有 2 个版本

[PDF] interspeech2020.org

[PDF][PDF] Cross-Lingual Text-To-Speech Synthesis via Domain Adaptation and Perceptual Similarity Regression in Speaker Space.

D Xin, Y Saito, S Takamichi, T Koriyama… - …, 2020 - interspeech2020.org

We present a method for improving the performance of crosslingual text-to-speech
synthesis. Previous works are able to model speaker individuality in speaker space via …

被引用次数：23 相关文章所有 7 个版本

[PDF] ieee.org

Perceptual-similarity-aware deep speaker representation learning for multi-speaker generative modeling

Y Saito, S Takamichi… - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org

We propose novel deep speaker representation learning that considers perceptual similarity
among speakers for multi-speaker generative modeling. Following its success in accurate …

被引用次数：13 相关文章所有 7 个版本

[PDF] sciencedirect.com

Deep Gaussian process based multi-speaker speech synthesis with latent speaker representation

K Mitsui, T Koriyama, H Saruwatari - Speech Communication, 2021 - Elsevier

This paper proposes deep Gaussian process (DGP)-based frameworks for multi-speaker
speech synthesis and speaker representation learning. A DGP has a deep architecture of …

被引用次数：7 相关文章所有 2 个版本

[PDF] wiley.com

Group‐level brain decoding with deep learning

R Csaky, MWJ Van Es, OP Jones… - Human Brain …, 2023 - Wiley Online Library

Decoding brain imaging data are gaining popularity, with applications in brain‐computer
interfaces and the study of neural representations. Decoding is typically subject‐specific and …

被引用次数：5 相关文章所有 12 个版本

[PDF] arxiv.org

Multi-speaker text-to-speech synthesis using deep Gaussian processes

K Mitsui, T Koriyama, H Saruwatari - arXiv preprint arXiv:2008.02950, 2020 - arxiv.org

Multi-speaker speech synthesis is a technique for modeling multiple speakers' voices with a
single model. Although many approaches using deep neural networks (DNNs) have been …

被引用次数：6 相关文章所有 6 个版本

Perceptual Analysis of Speaker Embeddings for Voice Discrimination between Machine And Human Listening

I Thoidis, C Gaultier, T Goehring - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org

This study investigates the information captured by speaker embeddings with relevance to
human speech perception. A Convolutional Neural Network was trained to perform one-shot …

被引用次数：1 相关文章