World: a vocoder-based high-quality speech synthesis system for real-time applications

M Morise, F Yokomori, K Ozawa - IEICE TRANSACTIONS on …, 2016 - search.ieice.org
A vocoder-based speech synthesis system, named WORLD, was developed in an effort to
improve the sound quality of real-time applications using speech. Speech analysis …

Non-autoregressive sequence-to-sequence voice conversion

T Hayashi, WC Huang, K Kobayashi… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
This paper proposes a novel voice conversion (VC) method based on non-autoregressive
sequence-to-sequence (NAR-S2S) models. Inspired by the great success of NAR-S2S …

Using instantaneous frequency and aperiodicity detection to estimate F0 for high-quality speech synthesis

H Kawahara, Y Agiomyrgiannakis, H Zen - arXiv preprint arXiv …, 2016 - arxiv.org
This paper introduces a general and flexible framework for F0 and aperiodicity (additive non
periodic component) analysis, specifically intended for high-quality speech synthesis and …

[HTML][HTML] Noise and acoustic modeling with waveform generator in text-to-speech and neutral speech conversion

MS Al-Radhi, TG Csapó, G Németh - Multimedia Tools and Applications, 2021 - Springer
This article focuses on developing a system for high-quality synthesized and converted
speech by addressing three fundamental principles. Although the noise-like component in …

[PDF][PDF] SparkNG: Interactive MATLAB Tools for Introduction to Speech Production, Perception and Processing Fundamentals and Application of the Aliasing-Free LF …

H Kawahara - INTERSPEECH, 2016 - isca-archive.org
This article introduces a set of interactive tools for studying fundamentals of speech
production, perception and processing. In addition to this voice production simulator, it …

[HTML][HTML] Continuous vocoder applied in deep neural network based voice conversion

MS Al-Radhi, TG Csapó, G Németh - Multimedia Tools and Applications, 2019 - Springer
In this paper, a novel vocoder is proposed for a Statistical Voice Conversion (SVC)
framework using deep neural network, where multiple features from the speech of two …

Analysis and synthesis of strong vocal expressions: Extension and application of audio texture features to singing voice

H Kawahara, M Morise - 2012 IEEE International Conference …, 2012 - ieeexplore.ieee.org
Realistic reconstruction and manipulation of strong vocal expressions found in singing
voices is a challenging and exciting topic. A speech analysis, modification and resynthesis …

MF-PAM: Accurate Pitch Estimation through Periodicity Analysis and Multi-level Feature Fusion

WJ Chung, D Kim, SW Chung, HG Kang - arXiv preprint arXiv:2306.09640, 2023 - arxiv.org
We introduce Multi-level feature Fusion-based Periodicity Analysis Model (MF-PAM), a novel
deep learning-based pitch estimation model that accurately estimates pitch trajectory in …

[PDF][PDF] Continuous vocoder in feed-forward deep neural network based speech synthesis

MS Al-Radhi, TG Csapó, G Németh - Proceedings of digital …, 2017 - malradhi.github.io
Deep neural networks Page 1 http://smartlab.tmit.bme.hu Continuous vocoder in feed-forward
deep neural network based speech synthesis Mohammed Salah Al-Radhi, Tamás Gábor …

[PDF][PDF] A Fast and Accurate Fundamental Frequency Estimator Using Recursive Moving Average Filters.

R Daido, Y Hisaminato - INTERSPEECH, 2016 - isca-archive.org
We propose a fundamental frequency (F0) estimation method which is fast, accurate and
suitable for real-time use. While the proposed method is based on the same framework as …