STEMM: Self-learning with speech-text manifold mixup for speech translation
How to learn a better speech representation for end-to-end speech-to-text translation (ST)
with limited labeled data? Existing techniques often attempt to transfer powerful machine …
with limited labeled data? Existing techniques often attempt to transfer powerful machine …
Sample, translate, recombine: Leveraging audio alignments for data augmentation in end-to-end speech translation
End-to-end speech translation relies on data that pair source-language speech inputs with
corresponding translations into a target language. Such data are notoriously scarce, making …
corresponding translations into a target language. Such data are notoriously scarce, making …
LLaST: Improved end-to-end speech translation system leveraged by large language models
We introduces LLaST, a framework for building high-performance Large Language model
based Speech-to-text Translation systems. We address the limitations of end-to-end speech …
based Speech-to-text Translation systems. We address the limitations of end-to-end speech …
Non-parametric domain adaptation for end-to-end speech translation
End-to-End Speech Translation (E2E-ST) has received increasing attention due to the
potential of its less error propagation, lower latency, and fewer parameters. However, the …
potential of its less error propagation, lower latency, and fewer parameters. However, the …
A multitask co-training framework for improving speech translation by leveraging speech recognition and machine translation tasks
Y Zhou, Y Yuan, X Shi - Neural Computing and Applications, 2024 - Springer
End-to-end speech translation (ST) has attracted substantial attention due to its less error
accumulation and lower latency. Based on triplet ST data⟨ speech-transcription …
accumulation and lower latency. Based on triplet ST data⟨ speech-transcription …
Improving speech translation by fusing speech and text
In speech translation, leveraging multimodal data to improve model performance and
address limitations of individual modalities has shown significant effectiveness. In this paper …
address limitations of individual modalities has shown significant effectiveness. In this paper …
Align, write, re-order: Explainable end-to-end speech translation via operation sequence generation
The black-box nature of end-to-end speech-to-text translation (E2E ST) makes it difficult to
understand how source language inputs are being mapped to the target language. To solve …
understand how source language inputs are being mapped to the target language. To solve …
FCGCL: Fine-and coarse-granularity contrastive learning for speech translation
H Zhang, N Si, Y Chen, Z Li, T Niu… - Findings of the …, 2022 - aclanthology.org
It is notoriously difficult to implement end-to-end speech translation (E2E-ST) model
because of the task complexity and data scarcity. Existing techniques often attempt to carry …
because of the task complexity and data scarcity. Existing techniques often attempt to carry …
Attention-based End-to-End Models in Language Technology
A Rouhe - 2024 - aaltodoc.aalto.fi
Speech recognition specifically, and language technology more generally, have started to
find everyday use. Challenging language tasks have become feasible through a continued …
find everyday use. Challenging language tasks have become feasible through a continued …