Skinaugment: Auto-encoding speaker conversions for automatic speech translation

C Xu, R Ye, Q Dong, C Zhao, T Ko, M Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

Recently, speech-to-text translation has attracted more and more attention and many studies
have emerged rapidly. In this paper, we present a comprehensive survey on direct speech …

被引用次数：13 相关文章所有 4 个版本

[PDF] arxiv.org

M³ST: Mix at Three Levels for Speech Translation

X Cheng, Q Dong, F Yue, T Ko… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

How to solve the data scarcity problem for end-to-end speech-to-text translation (ST)? It's
well known that data augmentation is an efficient method to improve performance for many …

被引用次数：50 相关文章所有 3 个版本

[PDF] aclanthology.org

The SIGMORPHON 2020 shared task on multilingual grapheme-to-phoneme conversion

K Gorman, LFE Ashby, A Goyzueta… - Proceedings of the …, 2020 - aclanthology.org

We describe the design and findings of the SIGMORPHON 2020 shared task on multilingual
grapheme-to-phoneme conversion. Participants were asked to submit systems which take in …

被引用次数：61 相关文章所有 6 个版本

[PDF] arxiv.org

Sample, translate, recombine: Leveraging audio alignments for data augmentation in end-to-end speech translation

TK Lam, S Schamoni, S Riezler - arXiv preprint arXiv:2203.08757, 2022 - arxiv.org

End-to-end speech translation relies on data that pair source-language speech inputs with
corresponding translations into a target language. Such data are notoriously scarce, making …

被引用次数：25 相关文章所有 6 个版本

[PDF] arxiv.org

Large-scale self-and semi-supervised learning for speech translation

C Wang, A Wu, J Pino, A Baevski, M Auli… - arXiv preprint arXiv …, 2021 - arxiv.org

In this paper, we improve speech translation (ST) through effectively leveraging large
quantities of unlabeled speech and text data in different and complementary ways. We …

被引用次数：40 相关文章所有 8 个版本

[PDF] arxiv.org

Leveraging pseudo-labeled data to improve direct speech-to-speech translation

Q Dong, F Yue, T Ko, M Wang, Q Bai… - arXiv preprint arXiv …, 2022 - arxiv.org

Direct Speech-to-speech translation (S2ST) has drawn more and more attention recently.
The task is very challenging due to data scarcity and complex speech-to-speech mapping. In …

被引用次数：19 相关文章所有 7 个版本

[PDF] aclanthology.org

Effectively pretraining a speech translation decoder with machine translation data

A Alinejad, A Sarkar - Proceedings of the 2020 Conference on …, 2020 - aclanthology.org

Directly translating from speech to text using an end-to-end approach is still challenging for
many language pairs due to insufficient data. Although pretraining the encoder parameters …

被引用次数：34 相关文章

[PDF] openreview.net

Translatotron 2: Robust direct speech-to-speech translation

Y Jia, MT Ramanovich, T Remez, R Pomerantz - 2021 - openreview.net

We present Translatotron 2, a neural direct speech-to-speech translation model that can be
trained end-to-end. Translatotron 2 consists of a speech encoder, a phoneme decoder, a …

被引用次数：26 相关文章

[PDF] arxiv.org

Learning when to translate for streaming speech

Q Dong, Y Zhu, M Wang, L Li - arXiv preprint arXiv:2109.07368, 2021 - arxiv.org

How to find proper moments to generate partial sentence translation given a streaming
speech input? Existing approaches waiting-and-translating for a fixed duration often break …

被引用次数：22 相关文章所有 8 个版本

[PDF] arxiv.org

Self-supervised representations improve end-to-end speech translation

A Wu, C Wang, J Pino, J Gu - arXiv preprint arXiv:2006.12124, 2020 - arxiv.org

End-to-end speech-to-text translation can provide a simpler and smaller system but is facing
the challenge of data scarcity. Pre-training methods can leverage unlabeled data and have …

被引用次数：36 相关文章所有 8 个版本