STEMM: Self-learning with speech-text manifold mixup for speech translation
How to learn a better speech representation for end-to-end speech-to-text translation (ST)
with limited labeled data? Existing techniques often attempt to transfer powerful machine …
with limited labeled data? Existing techniques often attempt to transfer powerful machine …
The multilingual tedx corpus for speech recognition and translation
We present the Multilingual TEDx corpus, built to support speech recognition (ASR) and
speech translation (ST) research across many non-English source languages. The corpus is …
speech translation (ST) research across many non-English source languages. The corpus is …
Cascade versus direct speech translation: Do the differences still make a difference?
Five years after the first published proofs of concept, direct approaches to speech translation
(ST) are now competing with traditional cascade solutions. In light of this steady progress …
(ST) are now competing with traditional cascade solutions. In light of this steady progress …
Learning shared semantic space for speech-to-text translation
Having numerous potential applications and great impact, end-to-end speech translation
(ST) has long been treated as an independent task, failing to fully draw strength from the …
(ST) has long been treated as an independent task, failing to fully draw strength from the …
Speech translation and the end-to-end promise: Taking stock of where we are
M Sperber, M Paulik - arXiv preprint arXiv:2004.06358, 2020 - arxiv.org
Over its three decade history, speech translation has experienced several shifts in its
primary research themes; moving from loosely coupled cascades of speech recognition and …
primary research themes; moving from loosely coupled cascades of speech recognition and …
Stacked acoustic-and-textual encoding: Integrating the pre-trained models into speech translation encoders
Encoder pre-training is promising in end-to-end Speech Translation (ST), given the fact that
speech-to-translation data is scarce. But ST encoders are not simple instances of Automatic …
speech-to-translation data is scarce. But ST encoders are not simple instances of Automatic …
Listen, understand and translate: Triple supervision decouples end-to-end speech-to-text translation
An end-to-end speech-to-text translation (ST) takes audio in a source language and outputs
the text in a target language. Existing methods are limited by the amount of parallel corpus …
the text in a target language. Existing methods are limited by the amount of parallel corpus …
Multimodal machine translation through visuals and speech
Multimodal machine translation involves drawing information from more than one modality,
based on the assumption that the additional modalities will contain useful alternative views …
based on the assumption that the additional modalities will contain useful alternative views …
Covost: A diverse multilingual speech-to-text translation corpus
Spoken language translation has recently witnessed a resurgence in popularity, thanks to
the development of end-to-end models and the creation of new corpora, such as Augmented …
the development of end-to-end models and the creation of new corpora, such as Augmented …
Self-training for end-to-end speech translation
One of the main challenges for end-to-end speech translation is data scarcity. We leverage
pseudo-labels generated from unlabeled audio by a cascade and an end-to-end speech …
pseudo-labels generated from unlabeled audio by a cascade and an end-to-end speech …