Error evaluation of an F0-adaptive spectral envelope estimator in robustness against the...

M Morise, F Yokomori, K Ozawa - IEICE TRANSACTIONS on …, 2016 - search.ieice.org

A vocoder-based speech synthesis system, named WORLD, was developed in an effort to
improve the sound quality of real-time applications using speech. Speech analysis …

被引用次数：1432 相关文章所有 11 个版本

[HTML] sciencedirect.com

[HTML][HTML] D4C, a band-aperiodicity estimator for high-quality speech synthesis

M Morise - Speech Communication, 2016 - Elsevier

An algorithm is proposed for estimating the band aperiodicity of speech signals, where
“aperiodicity” is defined as the power ratio between the speech signal and the aperiodic …

被引用次数：224 相关文章所有 6 个版本

[PDF] isca-archive.org

[PDF][PDF] Harvest: A High-Performance Fundamental Frequency Estimator from Speech Signals.

M Morise - INTERSPEECH, 2017 - isca-archive.org

A fundamental frequency (F0) estimator named Harvest is described. The unique points of
Harvest are that it can obtain a reliable F0 contour and reduce the error that the voiced …

被引用次数：107 相关文章所有 4 个版本

[PDF] ed.ac.uk

A deep auto-encoder based low-dimensional feature extraction from FFT spectral envelopes for statistical parametric speech synthesis

S Takaki, J Yamagishi - 2016 IEEE International Conference on …, 2016 - ieeexplore.ieee.org

In the state-of-the-art statistical parametric speech synthesis system, a speech analysis
module, eg STRAIGHT spectral analysis, is generally used for obtaining accurate and stable …

被引用次数：50 相关文章所有 3 个版本

[PDF] jst.go.jp

Sound quality comparison among high-quality vocoders by using re-synthesized speech

M Morise, Y Watanabe - Acoustical Science and Technology, 2018 - jstage.jst.go.jp

Since we have released WORLD on GitHubÃ and have been continuously updating
WORLD to improve the sound quality of the synthesized speech, there is no information on …

被引用次数：29 相关文章所有 3 个版本

[PDF] researchgate.net

[PDF][PDF] Low-Dimensional Representation of Spectral Envelope Without Deterioration for Full-Band Speech Analysis/Synthesis System.

M Morise, G Miyashita, K Ozawa - INTERSPEECH, 2017 - researchgate.net

A speech coding for a full-band speech analysis/synthesis system is described. In this work,
full-band speech is defined as speech with a sampling frequency above 40 kHz, whose …

被引用次数：13 相关文章所有 5 个版本

[PDF] arxiv.org

Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data

N Makishima, S Suzuki, A Ando… - arXiv preprint arXiv …, 2022 - arxiv.org

In this paper, we investigate the semi-supervised joint training of text to speech (TTS) and
automatic speech recognition (ASR), where a small amount of paired data and a large …

被引用次数：2 相关文章所有 6 个版本

[PDF] ieee.org

Voice conversion with CycleRNN-based spectral mapping and finely tuned WaveNet vocoder

PL Tobing, YC Wu, T Hayashi, K Kobayashi… - IEEE Access, 2019 - ieeexplore.ieee.org

In this paper, we present a novel framework for a voice conversion (VC) system based on a
cyclic recurrent neural network (CycleRNN) and a finely tuned WaveNet vocoder. Even …

被引用次数：10 相关文章所有 3 个版本

Efficient shallow wavenet vocoder using multiple samples output based on laplacian distribution and linear prediction

PL Tobing, YC Wu, T Hayashi… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

This paper presents a novel way for an efficient implementation scheme of shallow WaveNet
vocoder with multiple samples (segment) output based on the use of Laplacian distribution …

被引用次数：8 相关文章

[PDF] apsipa.org

Human-in-the-loop speech-design system and its evaluation

D Kondo, M Morise - 2019 Asia-Pacific Signal and Information …, 2019 - ieeexplore.ieee.org

We propose human-in-the-loop (HITL) speech-design system with an interface. General text-
to-speech (TTS) systems generate the speech waveform from the input text without the need …

被引用次数：6 相关文章所有 3 个版本