Bygpt5: End-to-end style-conditioned poetry generation with token-free language models
J Belouadi, S Eger - arXiv preprint arXiv:2212.10474, 2022 - arxiv.org
State-of-the-art poetry generation systems are often complex. They either consist of task-
specific model pipelines, incorporate prior knowledge in the form of manually created …
specific model pipelines, incorporate prior knowledge in the form of manually created …
Pronunciation Dictionary-Free Multilingual Speech Synthesis Using Learned Phonetic Representations
This article presents a multilingual speech synthesis approach that leverages learned
phonetic representations to eliminate the need for pronunciation dictionaries in target …
phonetic representations to eliminate the need for pronunciation dictionaries in target …
T5g2p: Text-to-text transfer transformer based grapheme-to-phoneme conversion
M Řezáčková, D Tihelka… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
The present paper explores the use of several deep neural network architectures to carry out
a grapheme-to-phoneme (G2P) conversion, aiming to find a universal and language …
a grapheme-to-phoneme (G2P) conversion, aiming to find a universal and language …
Speak While You Think: Streaming Speech Synthesis During Text Generation
Large Language Models (LLMs) demonstrate impressive capabilities, yet interaction with
these models is mostly facilitated through text. Using Text-To-Speech to synthesize LLM …
these models is mostly facilitated through text. Using Text-To-Speech to synthesize LLM …
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech
We present XPhoneBERT, the first multilingual model pre-trained to learn phoneme
representations for the downstream text-to-speech (TTS) task. Our XPhoneBERT has the …
representations for the downstream text-to-speech (TTS) task. Our XPhoneBERT has the …
Improving seq2seq tts frontends with transcribed speech audio
Due to the data inefficiency and low speech quality of grapheme-based end-to-end text-to-
speech (TTS), having a separate high-performance TTS linguistic frontend is still commonly …
speech (TTS), having a separate high-performance TTS linguistic frontend is still commonly …
Improving grapheme-to-phoneme conversion by learning pronunciations from speech recordings
The Grapheme-to-Phoneme (G2P) task aims to convert orthographic input into a discrete
phonetic representation. G2P conversion is beneficial to various speech processing …
phonetic representation. G2P conversion is beneficial to various speech processing …
The taste of IPA: Towards open-vocabulary keyword spotting and forced alignment in any language
In this project, we demonstrate that phoneme-based models for speech processing can
achieve strong crosslinguistic generalizability to unseen languages. We curated the …
achieve strong crosslinguistic generalizability to unseen languages. We curated the …
[PDF][PDF] Learning pronunciation from other accents via pronunciation knowledge transfer
S Sun, K Richmond - Interspeech, 2024 - isca-archive.org
Bootstrapping has proven to be effective in transforming a conventional pipeline-based
linguistic frontend to an integrated Sequence-to-Sequence (Seq2Seq) frontend for text-to …
linguistic frontend to an integrated Sequence-to-Sequence (Seq2Seq) frontend for text-to …
Data Driven Grapheme-to-Phoneme Representations for a Lexicon-Free Text-to-Speech
Grapheme-to-Phoneme (G2P) is an essential first step in any modern, high-quality Text-to-
Speech (TTS) system. Most of the current G2P systems rely on carefully hand-crafted …
Speech (TTS) system. Most of the current G2P systems rely on carefully hand-crafted …