Statistical parametric speech synthesis incorporating generative adversarial networks Y Saito, S Takamichi, H Saruwatari IEEE/ACM Transactions on Audio, Speech, and Language Processing 26 (1), 84-96, 2017 | 264 | 2017 |
Non-parallel voice conversion using variational autoencoders conditioned by phonetic posteriorgrams and d-vectors Y Saito, Y Ijima, K Nishida, S Takamichi 2018 IEEE International Conference on Acoustics, Speech and Signal …, 2018 | 142 | 2018 |
JVS corpus: free Japanese multi-speaker voice corpus S Takamichi, K Mitsui, Y Saito, T Koriyama, N Tanji, H Saruwatari arXiv preprint arXiv:1908.06248, 2019 | 68 | 2019 |
Voice conversion using sequence-to-sequence learning of context posterior probabilities H Miyoshi, Y Saito, S Takamichi, H Saruwatari arXiv preprint arXiv:1704.02360, 2017 | 68 | 2017 |
JSUT and JVS: Free Japanese voice corpora for accelerating speech synthesis research S Takamichi, R Sonobe, K Mitsui, Y Saito, T Koriyama, N Tanji, ... Acoustical Science and Technology 41 (5), 761-768, 2020 | 57 | 2020 |
Phase reconstruction from amplitude spectrograms based on von-Mises-distribution deep neural network S Takamichi, Y Saito, N Takamune, D Kitamura, H Saruwatari 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC …, 2018 | 47 | 2018 |
Training algorithm to deceive anti-spoofing verification for DNN-based speech synthesis Y Saito, S Takamichi, H Saruwatari 2017 IEEE International Conference on Acoustics, Speech and Signal …, 2017 | 37 | 2017 |
Voice conversion using input-to-output highway networks Y Saito, S Takamichi, H Saruwatari IEICE Transactions on Information and Systems 100 (8), 1925-1928, 2017 | 32 | 2017 |
Text-to-speech synthesis using STFT spectra based on low-/multi-resolution generative adversarial networks Y Saito, S Takamichi, H Saruwatari 2018 IEEE International Conference on Acoustics, Speech and Signal …, 2018 | 29 | 2018 |
Phase reconstruction from amplitude spectrograms based on directional-statistics deep neural networks S Takamichi, Y Saito, N Takamune, D Kitamura, H Saruwatari Signal Processing 169, 107368, 2020 | 25 | 2020 |
Cross-Lingual Text-To-Speech Synthesis via Domain Adaptation and Perceptual Similarity Regression in Speaker Space. D Xin, Y Saito, S Takamichi, T Koriyama, H Saruwatari Interspeech, 2947-2951, 2020 | 22 | 2020 |
Face2Speech: Towards Multi-Speaker Text-to-Speech Synthesis Using an Embedding Vector Predicted from a Face Image. S Goto, K Onishi, Y Saito, K Tachibana, K Mori INTERSPEECH, 1321-1325, 2020 | 21 | 2020 |
Real-Time, Full-Band, Online DNN-Based Voice Conversion System Using a Single CPU. T Saeki, Y Saito, S Takamichi, H Saruwatari INTERSPEECH, 1021-1022, 2020 | 14 | 2020 |
HumanGAN: generative adversarial network with human-based discriminator and its evaluation in speech perception modeling K Fujii, Y Saito, S Takamichi, Y Baba, H Saruwatari ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020 | 14 | 2020 |
Perceptual-similarity-aware deep speaker representation learning for multi-speaker generative modeling Y Saito, S Takamichi, H Saruwatari IEEE/ACM Transactions on Audio, Speech, and Language Processing 29, 1033-1048, 2021 | 13 | 2021 |
Cross-Lingual Speaker Adaptation Using Domain Adaptation and Speaker Consistency Loss for Text-To-Speech Synthesis. D Xin, Y Saito, S Takamichi, T Koriyama, H Saruwatari Interspeech, 1614-1618, 2021 | 12 | 2021 |
Vocoder-free text-to-speech synthesis incorporating generative adversarial networks using low-/multi-frequency STFT amplitude spectra Y Saito, S Takamichi, H Saruwatari Computer Speech & Language 58, 347-363, 2019 | 12 | 2019 |
DNN-based speaker embedding using subjective inter-speaker similarity for multi-speaker modeling in speech synthesis Y Saito, S Takamichi, H Saruwatari arXiv preprint arXiv:1907.08294, 2019 | 12 | 2019 |
Generative moment matching network-based random modulation post-filter for DNN-based singing voice synthesis and neural double-tracking H Tamaru, Y Saito, S Takamichi, T Koriyama, H Saruwatari ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019 | 10 | 2019 |
Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech D Yang, T Koriyama, Y Saito, T Saeki, D Xin, H Saruwatari ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | 8 | 2023 |