World: a vocoder-based high-quality speech synthesis system for real-time applications
A vocoder-based speech synthesis system, named WORLD, was developed in an effort to
improve the sound quality of real-time applications using speech. Speech analysis …
improve the sound quality of real-time applications using speech. Speech analysis …
Non-autoregressive sequence-to-sequence voice conversion
This paper proposes a novel voice conversion (VC) method based on non-autoregressive
sequence-to-sequence (NAR-S2S) models. Inspired by the great success of NAR-S2S …
sequence-to-sequence (NAR-S2S) models. Inspired by the great success of NAR-S2S …
Using instantaneous frequency and aperiodicity detection to estimate F0 for high-quality speech synthesis
This paper introduces a general and flexible framework for F0 and aperiodicity (additive non
periodic component) analysis, specifically intended for high-quality speech synthesis and …
periodic component) analysis, specifically intended for high-quality speech synthesis and …
[HTML][HTML] Noise and acoustic modeling with waveform generator in text-to-speech and neutral speech conversion
This article focuses on developing a system for high-quality synthesized and converted
speech by addressing three fundamental principles. Although the noise-like component in …
speech by addressing three fundamental principles. Although the noise-like component in …
[PDF][PDF] SparkNG: Interactive MATLAB Tools for Introduction to Speech Production, Perception and Processing Fundamentals and Application of the Aliasing-Free LF …
H Kawahara - INTERSPEECH, 2016 - isca-archive.org
This article introduces a set of interactive tools for studying fundamentals of speech
production, perception and processing. In addition to this voice production simulator, it …
production, perception and processing. In addition to this voice production simulator, it …
[HTML][HTML] Continuous vocoder applied in deep neural network based voice conversion
In this paper, a novel vocoder is proposed for a Statistical Voice Conversion (SVC)
framework using deep neural network, where multiple features from the speech of two …
framework using deep neural network, where multiple features from the speech of two …
Analysis and synthesis of strong vocal expressions: Extension and application of audio texture features to singing voice
H Kawahara, M Morise - 2012 IEEE International Conference …, 2012 - ieeexplore.ieee.org
Realistic reconstruction and manipulation of strong vocal expressions found in singing
voices is a challenging and exciting topic. A speech analysis, modification and resynthesis …
voices is a challenging and exciting topic. A speech analysis, modification and resynthesis …
MF-PAM: Accurate Pitch Estimation through Periodicity Analysis and Multi-level Feature Fusion
We introduce Multi-level feature Fusion-based Periodicity Analysis Model (MF-PAM), a novel
deep learning-based pitch estimation model that accurately estimates pitch trajectory in …
deep learning-based pitch estimation model that accurately estimates pitch trajectory in …
[PDF][PDF] Continuous vocoder in feed-forward deep neural network based speech synthesis
Deep neural networks Page 1 http://smartlab.tmit.bme.hu Continuous vocoder in feed-forward
deep neural network based speech synthesis Mohammed Salah Al-Radhi, Tamás Gábor …
deep neural network based speech synthesis Mohammed Salah Al-Radhi, Tamás Gábor …
[PDF][PDF] A Fast and Accurate Fundamental Frequency Estimator Using Recursive Moving Average Filters.
R Daido, Y Hisaminato - INTERSPEECH, 2016 - isca-archive.org
We propose a fundamental frequency (F0) estimation method which is fast, accurate and
suitable for real-time use. While the proposed method is based on the same framework as …
suitable for real-time use. While the proposed method is based on the same framework as …