A multi-level representation of f0 using the continuous wavelet transform and the discrete cosine transform
MS Ribeiro, RAJ Clark - 2015 IEEE International Conference …, 2015 - ieeexplore.ieee.org
We propose a representation of f0 using the Continuous Wavelet Transform (CWT) and the
Discrete Cosine Transform (DCT). The CWT decomposes the signal into various scales of …
Discrete Cosine Transform (DCT). The CWT decomposes the signal into various scales of …
Phonological vocoding using artificial neural networks
We investigate a vocoder based on artificial neural networks using a phonological speech
representation. Speech decomposition is based on the phonological encoders, realised as …
representation. Speech decomposition is based on the phonological encoders, realised as …
Decision tree usage for incremental parametric speech synthesis
T Baumann - 2014 IEEE International Conference on Acoustics …, 2014 - ieeexplore.ieee.org
Human speakers plan and deliver their utterances incrementally, piece-by-piece, and it is
obvious that their choice regarding phonetic details (and the details' peculiarities) is rarely …
obvious that their choice regarding phonetic details (and the details' peculiarities) is rarely …
A perceptual investigation of wavelet-based decomposition of f0 for text-to-speech synthesis
MS Ribeiro, J Yamagishi… - INTERSPEECH 2015 16th …, 2015 - research.ed.ac.uk
Abstract The Continuous Wavelet Transform (CWT) has been recently proposed to model f0
in the context of speech synthesis. It was shown that systems using signal decomposition …
in the context of speech synthesis. It was shown that systems using signal decomposition …
Preliminary work on speaker adaptation for DNN-based speech synthesis
We investigate speaker adaptation in the context of deep neural network (DNN) based
speech synthesis. More specifically, our current work focuses on the exploitation of auxiliary …
speech synthesis. More specifically, our current work focuses on the exploitation of auxiliary …
Partial representations improve the prosody of incremental speech synthesis
T Baumann - 2014 - edoc.sub.uni-hamburg.de
When humans speak, they do not plan their full utterance in all detail before beginning to
speak, nor do they speak piece-by-piece and ignoring their full message–instead humans …
speak, nor do they speak piece-by-piece and ignoring their full message–instead humans …
Learning word vector representations based on acoustic counts
MS Ribeiro, O Watts, J Yamagishi - Interspeech 2017, 2017 - research.ed.ac.uk
This paper presents a simple count-based approach to learning word vector representations
by leveraging statistics of cooccurrences between text and speech. This type of …
by leveraging statistics of cooccurrences between text and speech. This type of …
[PDF][PDF] Syllable-Level Representations of Suprasegmental Features for DNN-Based Text-to-Speech Synthesis.
MS Ribeiro, O Watts, J Yamagishi - INTERSPEECH, 2016 - cstr.ed.ac.uk
A top-down hierarchical system based on deep neural networks is investigated for the
modeling of prosody in speech synthesis. Suprasegmental features are processed …
modeling of prosody in speech synthesis. Suprasegmental features are processed …
A small-footprint context-independent HMM-based synthesizer for Tamil
G Anushiya Rachel, V Sherlin Solomi… - International Journal of …, 2015 - Springer
A text-to-speech synthesis system produces intelligible and natural speech corresponding to
any given text. Two main attributes of a synthesizer are the quality of speech produced and …
any given text. Two main attributes of a synthesizer are the quality of speech produced and …
Incremental syllable-context phonetic vocoding
Current very low bit rate speech coders are, due to complexity limitations, designed to work
off-line. This paper investigates incremental speech coding that operates real-time and …
off-line. This paper investigates incremental speech coding that operates real-time and …