Recent advances in direct speech-to-text translation
Recently, speech-to-text translation has attracted more and more attention and many studies
have emerged rapidly. In this paper, we present a comprehensive survey on direct speech …
have emerged rapidly. In this paper, we present a comprehensive survey on direct speech …
M3ST: Mix at Three Levels for Speech Translation
How to solve the data scarcity problem for end-to-end speech-to-text translation (ST)? It's
well known that data augmentation is an efficient method to improve performance for many …
well known that data augmentation is an efficient method to improve performance for many …
The SIGMORPHON 2020 shared task on multilingual grapheme-to-phoneme conversion
K Gorman, LFE Ashby, A Goyzueta… - Proceedings of the …, 2020 - aclanthology.org
We describe the design and findings of the SIGMORPHON 2020 shared task on multilingual
grapheme-to-phoneme conversion. Participants were asked to submit systems which take in …
grapheme-to-phoneme conversion. Participants were asked to submit systems which take in …
Sample, translate, recombine: Leveraging audio alignments for data augmentation in end-to-end speech translation
End-to-end speech translation relies on data that pair source-language speech inputs with
corresponding translations into a target language. Such data are notoriously scarce, making …
corresponding translations into a target language. Such data are notoriously scarce, making …
Large-scale self-and semi-supervised learning for speech translation
In this paper, we improve speech translation (ST) through effectively leveraging large
quantities of unlabeled speech and text data in different and complementary ways. We …
quantities of unlabeled speech and text data in different and complementary ways. We …
Leveraging pseudo-labeled data to improve direct speech-to-speech translation
Direct Speech-to-speech translation (S2ST) has drawn more and more attention recently.
The task is very challenging due to data scarcity and complex speech-to-speech mapping. In …
The task is very challenging due to data scarcity and complex speech-to-speech mapping. In …
Effectively pretraining a speech translation decoder with machine translation data
A Alinejad, A Sarkar - Proceedings of the 2020 Conference on …, 2020 - aclanthology.org
Directly translating from speech to text using an end-to-end approach is still challenging for
many language pairs due to insufficient data. Although pretraining the encoder parameters …
many language pairs due to insufficient data. Although pretraining the encoder parameters …
Translatotron 2: Robust direct speech-to-speech translation
We present Translatotron 2, a neural direct speech-to-speech translation model that can be
trained end-to-end. Translatotron 2 consists of a speech encoder, a phoneme decoder, a …
trained end-to-end. Translatotron 2 consists of a speech encoder, a phoneme decoder, a …
Learning when to translate for streaming speech
How to find proper moments to generate partial sentence translation given a streaming
speech input? Existing approaches waiting-and-translating for a fixed duration often break …
speech input? Existing approaches waiting-and-translating for a fixed duration often break …
Self-supervised representations improve end-to-end speech translation
End-to-end speech-to-text translation can provide a simpler and smaller system but is facing
the challenge of data scarcity. Pre-training methods can leverage unlabeled data and have …
the challenge of data scarcity. Pre-training methods can leverage unlabeled data and have …