Bygpt5: End-to-end style-conditioned poetry generation with token-free language models

J Belouadi, S Eger - arXiv preprint arXiv:2212.10474, 2022 - arxiv.org
State-of-the-art poetry generation systems are often complex. They either consist of task-
specific model pipelines, incorporate prior knowledge in the form of manually created …

Pronunciation Dictionary-Free Multilingual Speech Synthesis Using Learned Phonetic Representations

C Liu, ZH Ling, LH Chen - IEEE/ACM Transactions on Audio …, 2023 - ieeexplore.ieee.org
This article presents a multilingual speech synthesis approach that leverages learned
phonetic representations to eliminate the need for pronunciation dictionaries in target …

T5g2p: Text-to-text transfer transformer based grapheme-to-phoneme conversion

M Řezáčková, D Tihelka… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
The present paper explores the use of several deep neural network architectures to carry out
a grapheme-to-phoneme (G2P) conversion, aiming to find a universal and language …

Speak While You Think: Streaming Speech Synthesis During Text Generation

A Dekel, S Shechtman, R Fernandez… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Large Language Models (LLMs) demonstrate impressive capabilities, yet interaction with
these models is mostly facilitated through text. Using Text-To-Speech to synthesize LLM …

XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech

LT Nguyen, T Pham, DQ Nguyen - arXiv preprint arXiv:2305.19709, 2023 - arxiv.org
We present XPhoneBERT, the first multilingual model pre-trained to learn phoneme
representations for the downstream text-to-speech (TTS) task. Our XPhoneBERT has the …

Improving seq2seq tts frontends with transcribed speech audio

S Sun, K Richmond, H Tang - IEEE/ACM Transactions on …, 2023 - ieeexplore.ieee.org
Due to the data inefficiency and low speech quality of grapheme-based end-to-end text-to-
speech (TTS), having a separate high-performance TTS linguistic frontend is still commonly …

Improving grapheme-to-phoneme conversion by learning pronunciations from speech recordings

MS Ribeiro, G Comini, J Lorenzo-Trueba - arXiv preprint arXiv:2307.16643, 2023 - arxiv.org
The Grapheme-to-Phoneme (G2P) task aims to convert orthographic input into a discrete
phonetic representation. G2P conversion is beneficial to various speech processing …

The taste of IPA: Towards open-vocabulary keyword spotting and forced alignment in any language

J Zhu, C Yang, F Samir, J Islam - … of the 2024 Conference of the …, 2024 - aclanthology.org
In this project, we demonstrate that phoneme-based models for speech processing can
achieve strong crosslinguistic generalizability to unseen languages. We curated the …

[PDF][PDF] Learning pronunciation from other accents via pronunciation knowledge transfer

S Sun, K Richmond - Interspeech, 2024 - isca-archive.org
Bootstrapping has proven to be effective in transforming a conventional pipeline-based
linguistic frontend to an integrated Sequence-to-Sequence (Seq2Seq) frontend for text-to …

Data Driven Grapheme-to-Phoneme Representations for a Lexicon-Free Text-to-Speech

A Garg, J Kim, S Khyalia, C Kim… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Grapheme-to-Phoneme (G2P) is an essential first step in any modern, high-quality Text-to-
Speech (TTS) system. Most of the current G2P systems rely on carefully hand-crafted …