A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arXiv preprint arXiv:2106.15561, 2021 - arxiv.org
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

Glow-tts: A generative flow for text-to-speech via monotonic alignment search

J Kim, S Kim, J Kong, S Yoon - Advances in Neural …, 2020 - proceedings.neurips.cc
Abstract Recently, text-to-speech (TTS) models such as FastSpeech and ParaNet have been
proposed to generate mel-spectrograms from text in parallel. Despite the advantage, the …

Review of end-to-end speech synthesis technology based on deep learning

Z Mu, X Yang, Y Dong - arXiv preprint arXiv:2104.09995, 2021 - arxiv.org
As an indispensable part of modern human-computer interaction system, speech synthesis
technology helps users get the output of intelligent machine more easily and intuitively, thus …

Speech-t: Transducer for text to speech and beyond

J Chen, X Tan, Y Leng, J Xu, G Wen… - Advances in Neural …, 2021 - proceedings.neurips.cc
Abstract Neural Transducer (eg, RNN-T) has been widely used in automatic speech
recognition (ASR) due to its capabilities of efficiently modeling monotonic alignments …

Tdass: Target domain adaptation speech synthesis framework for multi-speaker low-resource tts

X Zhang, J Wang, N Cheng… - 2022 International Joint …, 2022 - ieeexplore.ieee.org
Recently, synthesizing personalized speech by text-to-speech (TTS) application is highly
demanded. But the previous TTS models require a mass of target speaker speeches for …

Incremental text-to-speech synthesis using pseudo lookahead with large pretrained language model

T Saeki, S Takamichi… - IEEE Signal Processing …, 2021 - ieeexplore.ieee.org
This letter presents an incremental text-to-speech (TTS) method that performs synthesis in
small linguistic units while maintaining the naturalness of output speech. Incremental TTS is …

What the future brings: Investigating the impact of lookahead for incremental neural TTS

B Stephenson, L Besacier, L Girin, T Hueber - arXiv preprint arXiv …, 2020 - arxiv.org
In incremental text to speech synthesis (iTTS), the synthesizer produces an audio output
before it has access to the entire input sentence. In this paper, we study the behavior of a …

A machine speech chain approach for dynamically adaptive lombard tts in static and dynamic noise environments

S Novitasari, S Sakti… - IEEE/ACM Transactions on …, 2022 - ieeexplore.ieee.org
Recent end-to-end text-to-speech synthesis (TTS) systems have successfully synthesized
high-quality speech. However, TTS speech intelligibility degrades in noisy environments …

Speak While You Think: Streaming Speech Synthesis During Text Generation

A Dekel, S Shechtman, R Fernandez… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Large Language Models (LLMs) demonstrate impressive capabilities, yet interaction with
these models is mostly facilitated through text. Using Text-To-Speech to synthesize LLM …

Incremental text to speech for neural sequence-to-sequence models using reinforcement learning

DSR Mohan, R Lenain, L Foglianti, TH Teh… - arXiv preprint arXiv …, 2020 - arxiv.org
Modern approaches to text to speech require the entire input character sequence to be
processed before any audio is synthesised. This latency limits the suitability of such models …