A multi-level representation of f0 using the continuous wavelet transform and the discrete cosine transform

MS Ribeiro, RAJ Clark - 2015 IEEE International Conference …, 2015 - ieeexplore.ieee.org
We propose a representation of f0 using the Continuous Wavelet Transform (CWT) and the
Discrete Cosine Transform (DCT). The CWT decomposes the signal into various scales of …

Phonological vocoding using artificial neural networks

M Cernak, B Potard, PN Garner - 2015 IEEE International …, 2015 - ieeexplore.ieee.org
We investigate a vocoder based on artificial neural networks using a phonological speech
representation. Speech decomposition is based on the phonological encoders, realised as …

Decision tree usage for incremental parametric speech synthesis

T Baumann - 2014 IEEE International Conference on Acoustics …, 2014 - ieeexplore.ieee.org
Human speakers plan and deliver their utterances incrementally, piece-by-piece, and it is
obvious that their choice regarding phonetic details (and the details' peculiarities) is rarely …

A perceptual investigation of wavelet-based decomposition of f0 for text-to-speech synthesis

MS Ribeiro, J Yamagishi… - INTERSPEECH 2015 16th …, 2015 - research.ed.ac.uk
Abstract The Continuous Wavelet Transform (CWT) has been recently proposed to model f0
in the context of speech synthesis. It was shown that systems using signal decomposition …

Preliminary work on speaker adaptation for DNN-based speech synthesis

B Potard, P Motlicek, D Imseng - 2015 - infoscience.epfl.ch
We investigate speaker adaptation in the context of deep neural network (DNN) based
speech synthesis. More specifically, our current work focuses on the exploitation of auxiliary …

Partial representations improve the prosody of incremental speech synthesis

T Baumann - 2014 - edoc.sub.uni-hamburg.de
When humans speak, they do not plan their full utterance in all detail before beginning to
speak, nor do they speak piece-by-piece and ignoring their full message–instead humans …

Learning word vector representations based on acoustic counts

MS Ribeiro, O Watts, J Yamagishi - Interspeech 2017, 2017 - research.ed.ac.uk
This paper presents a simple count-based approach to learning word vector representations
by leveraging statistics of cooccurrences between text and speech. This type of …

[PDF][PDF] Syllable-Level Representations of Suprasegmental Features for DNN-Based Text-to-Speech Synthesis.

MS Ribeiro, O Watts, J Yamagishi - INTERSPEECH, 2016 - cstr.ed.ac.uk
A top-down hierarchical system based on deep neural networks is investigated for the
modeling of prosody in speech synthesis. Suprasegmental features are processed …

A small-footprint context-independent HMM-based synthesizer for Tamil

G Anushiya Rachel, V Sherlin Solomi… - International Journal of …, 2015 - Springer
A text-to-speech synthesis system produces intelligible and natural speech corresponding to
any given text. Two main attributes of a synthesizer are the quality of speech produced and …

Incremental syllable-context phonetic vocoding

M Cernak, PN Garner, A Lazaridis… - … /ACM Transactions on …, 2015 - ieeexplore.ieee.org
Current very low bit rate speech coders are, due to complexity limitations, designed to work
off-line. This paper investigates incremental speech coding that operates real-time and …