Fused acoustic and text encoding for multimodal bilingual pretraining and speech translation
Recently, representation learning for text and speech has successfully improved many
language related tasks. However, all existing methods suffer from two limitations:(a) they …
language related tasks. However, all existing methods suffer from two limitations:(a) they …
Paddlespeech: An easy-to-use all-in-one speech toolkit
PaddleSpeech is an open-source all-in-one speech toolkit. It aims at facilitating the
development and research of speech processing technologies by providing an easy-to-use …
development and research of speech processing technologies by providing an easy-to-use …
Direct simultaneous speech-to-text translation assisted by synchronized streaming ASR
Simultaneous speech-to-text translation is widely useful in many scenarios. The
conventional cascaded approach uses a pipeline of streaming ASR followed by …
conventional cascaded approach uses a pipeline of streaming ASR followed by …
Incremental text-to-speech synthesis with prefix-to-prefix framework
Text-to-speech synthesis (TTS) has witnessed rapid progress in recent years, where neural
methods became capable of producing audios with high naturalness. However, these efforts …
methods became capable of producing audios with high naturalness. However, these efforts …
ELITR multilingual live subtitling: Demo and strategy
This paper presents an automatic speech translation system aimed at live subtitling of
conference presentations. We describe the overall architecture and key processing …
conference presentations. We describe the overall architecture and key processing …
Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach
Simultaneous Speech-to-Text translation serves a critical role in real-time crosslingual
communication. Despite the advancements in recent years, challenges remain in achieving …
communication. Despite the advancements in recent years, challenges remain in achieving …
Low-latency incremental text-to-speech synthesis with distilled context prediction network
T Saeki, S Takamichi… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
Incremental text-to-speech (TTS) synthesis generates utterances in small linguistic units for
the sake of real-time and low-latency applications. We previously proposed an incremental …
the sake of real-time and low-latency applications. We previously proposed an incremental …
Barriers to Effective Evaluation of Simultaneous Interpretation
Simultaneous interpretation is an especially challenging form of translation because it
requires converting speech from one language to another in real-time. Though prior work …
requires converting speech from one language to another in real-time. Though prior work …
Direct simultaneous speech-to-speech translation with variational monotonic multihead attention
We present a direct simultaneous speech-to-speech translation (Simul-S2ST) model,
Furthermore, the generation of translation is independent from intermediate text …
Furthermore, the generation of translation is independent from intermediate text …
End-to-End Simultaneous Speech Translation
X Ma - 2022 - jscholarship.library.jhu.edu
Speech translation is the task of translating speech in one language to text or speech in
another language, while simultaneous translation aims at lower translation latency by …
another language, while simultaneous translation aims at lower translation latency by …