ByT5 model for massively multilingual grapheme-to-phoneme conversion

J Belouadi, S Eger - arXiv preprint arXiv:2212.10474, 2022 - arxiv.org

State-of-the-art poetry generation systems are often complex. They either consist of task-
specific model pipelines, incorporate prior knowledge in the form of manually created …

被引用次数：17 相关文章所有 5 个版本

Pronunciation Dictionary-Free Multilingual Speech Synthesis Using Learned Phonetic Representations

C Liu, ZH Ling, LH Chen - IEEE/ACM Transactions on Audio …, 2023 - ieeexplore.ieee.org

This article presents a multilingual speech synthesis approach that leverages learned
phonetic representations to eliminate the need for pronunciation dictionaries in target …

被引用次数：3 相关文章所有 2 个版本

T5g2p: Text-to-text transfer transformer based grapheme-to-phoneme conversion

M Řezáčková, D Tihelka… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org

The present paper explores the use of several deep neural network architectures to carry out
a grapheme-to-phoneme (G2P) conversion, aiming to find a universal and language …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Speak While You Think: Streaming Speech Synthesis During Text Generation

A Dekel, S Shechtman, R Fernandez… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

Large Language Models (LLMs) demonstrate impressive capabilities, yet interaction with
these models is mostly facilitated through text. Using Text-To-Speech to synthesize LLM …

被引用次数：4 相关文章所有 4 个版本

[PDF] arxiv.org

XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech

LT Nguyen, T Pham, DQ Nguyen - arXiv preprint arXiv:2305.19709, 2023 - arxiv.org

We present XPhoneBERT, the first multilingual model pre-trained to learn phoneme
representations for the downstream text-to-speech (TTS) task. Our XPhoneBERT has the …

被引用次数：3 相关文章

[PDF] ed.ac.uk

Improving seq2seq tts frontends with transcribed speech audio

S Sun, K Richmond, H Tang - IEEE/ACM Transactions on …, 2023 - ieeexplore.ieee.org

Due to the data inefficiency and low speech quality of grapheme-based end-to-end text-to-
speech (TTS), having a separate high-performance TTS linguistic frontend is still commonly …

被引用次数：4 相关文章所有 4 个版本

[PDF] arxiv.org

Improving grapheme-to-phoneme conversion by learning pronunciations from speech recordings

MS Ribeiro, G Comini, J Lorenzo-Trueba - arXiv preprint arXiv:2307.16643, 2023 - arxiv.org

The Grapheme-to-Phoneme (G2P) task aims to convert orthographic input into a discrete
phonetic representation. G2P conversion is beneficial to various speech processing …

被引用次数：3 相关文章所有 5 个版本

[PDF] aclanthology.org

The taste of IPA: Towards open-vocabulary keyword spotting and forced alignment in any language

J Zhu, C Yang, F Samir, J Islam - … of the 2024 Conference of the …, 2024 - aclanthology.org

In this project, we demonstrate that phoneme-based models for speech processing can
achieve strong crosslinguistic generalizability to unseen languages. We curated the …

被引用次数：3 相关文章

[PDF] isca-archive.org

[PDF][PDF] Learning pronunciation from other accents via pronunciation knowledge transfer

S Sun, K Richmond - Interspeech, 2024 - isca-archive.org

Bootstrapping has proven to be effective in transforming a conventional pipeline-based
linguistic frontend to an integrated Sequence-to-Sequence (Seq2Seq) frontend for text-to …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Data Driven Grapheme-to-Phoneme Representations for a Lexicon-Free Text-to-Speech

A Garg, J Kim, S Khyalia, C Kim… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

Grapheme-to-Phoneme (G2P) is an essential first step in any modern, high-quality Text-to-
Speech (TTS) system. Most of the current G2P systems rely on carefully hand-crafted …

被引用次数：2 相关文章所有 3 个版本