- 学术资源搜索

STEMM: Self-learning with speech-text manifold mixup for speech translation

Q Fang, R Ye, L Li, Y Feng, M Wang - arXiv preprint arXiv:2203.10426, 2022 - arxiv.org

How to learn a better speech representation for end-to-end speech-to-text translation (ST)
with limited labeled data? Existing techniques often attempt to transfer powerful machine …

被引用次数：90 相关文章所有 8 个版本

[PDF] arxiv.org

The multilingual tedx corpus for speech recognition and translation

E Salesky, M Wiesner, J Bremerman, R Cattoni… - arXiv preprint arXiv …, 2021 - arxiv.org

We present the Multilingual TEDx corpus, built to support speech recognition (ASR) and
speech translation (ST) research across many non-English source languages. The corpus is …

被引用次数：131 相关文章所有 12 个版本

[PDF] arxiv.org

Cascade versus direct speech translation: Do the differences still make a difference?

L Bentivogli, M Cettolo, M Gaido, A Karakanta… - arXiv preprint arXiv …, 2021 - arxiv.org

Five years after the first published proofs of concept, direct approaches to speech translation
(ST) are now competing with traditional cascade solutions. In light of this steady progress …

被引用次数：73 相关文章所有 11 个版本

[PDF] arxiv.org

Learning shared semantic space for speech-to-text translation

C Han, M Wang, H Ji, L Li - arXiv preprint arXiv:2105.03095, 2021 - arxiv.org

Having numerous potential applications and great impact, end-to-end speech translation
(ST) has long been treated as an independent task, failing to fully draw strength from the …

被引用次数：75 相关文章所有 7 个版本

[PDF] arxiv.org

Speech translation and the end-to-end promise: Taking stock of where we are

M Sperber, M Paulik - arXiv preprint arXiv:2004.06358, 2020 - arxiv.org

Over its three decade history, speech translation has experienced several shifts in its
primary research themes; moving from loosely coupled cascades of speech recognition and …

被引用次数：101 相关文章所有 4 个版本

[PDF] arxiv.org

Stacked acoustic-and-textual encoding: Integrating the pre-trained models into speech translation encoders

C Xu, B Hu, Y Li, Y Zhang, Q Ju, T Xiao, J Zhu - arXiv preprint arXiv …, 2021 - arxiv.org

Encoder pre-training is promising in end-to-end Speech Translation (ST), given the fact that
speech-to-translation data is scarce. But ST encoders are not simple instances of Automatic …

被引用次数：63 相关文章所有 5 个版本

[PDF] aaai.org

Listen, understand and translate: Triple supervision decouples end-to-end speech-to-text translation

Q Dong, R Ye, M Wang, H Zhou, S Xu, B Xu… - Proceedings of the AAAI …, 2021 - ojs.aaai.org

An end-to-end speech-to-text translation (ST) takes audio in a source language and outputs
the text in a target language. Existing methods are limited by the amount of parallel corpus …

被引用次数：61 相关文章所有 8 个版本

[PDF] springer.com

Multimodal machine translation through visuals and speech

U Sulubacak, O Caglayan, SA Grönroos, A Rouhe… - Machine …, 2020 - Springer

Multimodal machine translation involves drawing information from more than one modality,
based on the assumption that the additional modalities will contain useful alternative views …

被引用次数：86 相关文章所有 18 个版本

[PDF] arxiv.org

Covost: A diverse multilingual speech-to-text translation corpus

C Wang, J Pino, A Wu, J Gu - arXiv preprint arXiv:2002.01320, 2020 - arxiv.org

Spoken language translation has recently witnessed a resurgence in popularity, thanks to
the development of end-to-end models and the creation of new corpora, such as Augmented …

被引用次数：81 相关文章所有 5 个版本

[PDF] arxiv.org

Self-training for end-to-end speech translation

J Pino, Q Xu, X Ma, MJ Dousti, Y Tang - arXiv preprint arXiv:2006.02490, 2020 - arxiv.org

One of the main challenges for end-to-end speech translation is data scarcity. We leverage
pseudo-labels generated from unlabeled audio by a cascade and an end-to-end speech …

被引用次数：62 相关文章所有 8 个版本