Fused acoustic and text encoding for multimodal bilingual pretraining and speech translation

R Zheng, J Chen, M Ma… - … Conference on Machine …, 2021 - proceedings.mlr.press
Recently, representation learning for text and speech has successfully improved many
language related tasks. However, all existing methods suffer from two limitations:(a) they …

Paddlespeech: An easy-to-use all-in-one speech toolkit

H Zhang, T Yuan, J Chen, X Li, R Zheng… - arXiv preprint arXiv …, 2022 - arxiv.org
PaddleSpeech is an open-source all-in-one speech toolkit. It aims at facilitating the
development and research of speech processing technologies by providing an easy-to-use …

Direct simultaneous speech-to-text translation assisted by synchronized streaming ASR

J Chen, M Ma, R Zheng, L Huang - arXiv preprint arXiv:2106.06636, 2021 - arxiv.org
Simultaneous speech-to-text translation is widely useful in many scenarios. The
conventional cascaded approach uses a pipeline of streaming ASR followed by …

Incremental text-to-speech synthesis with prefix-to-prefix framework

M Ma, B Zheng, K Liu, R Zheng, H Liu, K Peng… - arXiv preprint arXiv …, 2019 - arxiv.org
Text-to-speech synthesis (TTS) has witnessed rapid progress in recent years, where neural
methods became capable of producing audios with high naturalness. However, these efforts …

ELITR multilingual live subtitling: Demo and strategy

O Bojar, D Macháček, S Sagar, O Smrž, J Kratochvíl… - 2021 - zora.uzh.ch
This paper presents an automatic speech translation system aimed at live subtitling of
conference presentations. We describe the overall architecture and key processing …

Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach

J Chen, J Xue, P Wang, J Pan… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Simultaneous Speech-to-Text translation serves a critical role in real-time crosslingual
communication. Despite the advancements in recent years, challenges remain in achieving …

Low-latency incremental text-to-speech synthesis with distilled context prediction network

T Saeki, S Takamichi… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
Incremental text-to-speech (TTS) synthesis generates utterances in small linguistic units for
the sake of real-time and low-latency applications. We previously proposed an incremental …

Barriers to Effective Evaluation of Simultaneous Interpretation

S Wein, I Te, C Cherry, J Juraska… - Findings of the …, 2024 - aclanthology.org
Simultaneous interpretation is an especially challenging form of translation because it
requires converting speech from one language to another in real-time. Though prior work …

Direct simultaneous speech-to-speech translation with variational monotonic multihead attention

X Ma, H Gong, D Liu, A Lee, Y Tang, PJ Chen… - arXiv preprint arXiv …, 2021 - arxiv.org
We present a direct simultaneous speech-to-speech translation (Simul-S2ST) model,
Furthermore, the generation of translation is independent from intermediate text …

End-to-End Simultaneous Speech Translation

X Ma - 2022 - jscholarship.library.jhu.edu
Speech translation is the task of translating speech in one language to text or speech in
another language, while simultaneous translation aims at lower translation latency by …