STEMM: Self-learning with speech-text manifold mixup for speech translation

Q Fang, R Ye, L Li, Y Feng, M Wang - arXiv preprint arXiv:2203.10426, 2022 - arxiv.org
How to learn a better speech representation for end-to-end speech-to-text translation (ST)
with limited labeled data? Existing techniques often attempt to transfer powerful machine …

Sample, translate, recombine: Leveraging audio alignments for data augmentation in end-to-end speech translation

TK Lam, S Schamoni, S Riezler - arXiv preprint arXiv:2203.08757, 2022 - arxiv.org
End-to-end speech translation relies on data that pair source-language speech inputs with
corresponding translations into a target language. Such data are notoriously scarce, making …

LLaST: Improved end-to-end speech translation system leveraged by large language models

X Chen, S Zhang, Q Bai, K Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduces LLaST, a framework for building high-performance Large Language model
based Speech-to-text Translation systems. We address the limitations of end-to-end speech …

Non-parametric domain adaptation for end-to-end speech translation

Y Du, W Wang, Z Zhang, B Chen, T Xu, J Xie… - arXiv preprint arXiv …, 2022 - arxiv.org
End-to-End Speech Translation (E2E-ST) has received increasing attention due to the
potential of its less error propagation, lower latency, and fewer parameters. However, the …

A multitask co-training framework for improving speech translation by leveraging speech recognition and machine translation tasks

Y Zhou, Y Yuan, X Shi - Neural Computing and Applications, 2024 - Springer
End-to-end speech translation (ST) has attracted substantial attention due to its less error
accumulation and lower latency. Based on triplet ST data⟨ speech-transcription …

Improving speech translation by fusing speech and text

W Yin, Z Liu, C Zhao, T Wang, J Tong, R Ye - arXiv preprint arXiv …, 2023 - arxiv.org
In speech translation, leveraging multimodal data to improve model performance and
address limitations of individual modalities has shown significant effectiveness. In this paper …

Align, write, re-order: Explainable end-to-end speech translation via operation sequence generation

M Omachi, B Yan, S Dalmia, Y Fujita… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
The black-box nature of end-to-end speech-to-text translation (E2E ST) makes it difficult to
understand how source language inputs are being mapped to the target language. To solve …

FCGCL: Fine-and coarse-granularity contrastive learning for speech translation

H Zhang, N Si, Y Chen, Z Li, T Niu… - Findings of the …, 2022 - aclanthology.org
It is notoriously difficult to implement end-to-end speech translation (E2E-ST) model
because of the task complexity and data scarcity. Existing techniques often attempt to carry …

Attention-based End-to-End Models in Language Technology

A Rouhe - 2024 - aaltodoc.aalto.fi
Speech recognition specifically, and language technology more generally, have started to
find everyday use. Challenging language tasks have become feasible through a continued …