作者
Efthymios Georgiou, Kosmas Kritsis, Georgios Paraskevopoulos, Athanasios Katsamanis, Vassilis Katsouros, Alexandros Potamianos
发表日期
2023/1/9
研讨会论文
2022 IEEE Spoken Language Technology Workshop (SLT)
页码范围
977-983
出版商
IEEE
简介
Deep learning Text-to-Speech (TTS) systems have achieved impressive generated speech quality, close to human parity. However, they suffer from training stability issues and in-correct alignment between the intermediate acoustic representation and the text input. In this work, we propose Regotron, a regularized Tacotron2 version which alleviates the training issues by augmenting the objective function with an additional term, which penalizes non-monotonic alignments in the location-sensitive attention mechanism. By introducing this regularization term we demonstrate its effectiveness to stabilize the training process, produce a monotonic attention quicker (13% of the total number of epochs compared to Tacotron2) and reduce the alignment errors during inference. Moreover, Regotron has minimal additional computational overhead, reduces common TTS mistakes and at the same time achieves improved …
引用总数
学术搜索中的文章
E Georgiou, K Kritsis, G Paraskevopoulos… - 2022 IEEE Spoken Language Technology Workshop …, 2023