Cascaded models with cyclic feedback for direct speech translation

Q Fang, R Ye, L Li, Y Feng, M Wang - arXiv preprint arXiv:2203.10426, 2022 - arxiv.org

How to learn a better speech representation for end-to-end speech-to-text translation (ST)
with limited labeled data? Existing techniques often attempt to transfer powerful machine …

被引用次数：89 相关文章所有 8 个版本

[PDF] arxiv.org

Sample, translate, recombine: Leveraging audio alignments for data augmentation in end-to-end speech translation

TK Lam, S Schamoni, S Riezler - arXiv preprint arXiv:2203.08757, 2022 - arxiv.org

End-to-end speech translation relies on data that pair source-language speech inputs with
corresponding translations into a target language. Such data are notoriously scarce, making …

被引用次数：25 相关文章所有 6 个版本

[PDF] arxiv.org

LLaST: Improved end-to-end speech translation system leveraged by large language models

X Chen, S Zhang, Q Bai, K Chen… - arXiv preprint arXiv …, 2024 - arxiv.org

We introduces LLaST, a framework for building high-performance Large Language model
based Speech-to-text Translation systems. We address the limitations of end-to-end speech …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

Non-parametric domain adaptation for end-to-end speech translation

Y Du, W Wang, Z Zhang, B Chen, T Xu, J Xie… - arXiv preprint arXiv …, 2022 - arxiv.org

End-to-End Speech Translation (E2E-ST) has received increasing attention due to the
potential of its less error propagation, lower latency, and fewer parameters. However, the …

被引用次数：15 相关文章所有 3 个版本

A multitask co-training framework for improving speech translation by leveraging speech recognition and machine translation tasks

Y Zhou, Y Yuan, X Shi - Neural Computing and Applications, 2024 - Springer

End-to-end speech translation (ST) has attracted substantial attention due to its less error
accumulation and lower latency. Based on triplet ST data⟨ speech-transcription …

被引用次数：1 相关文章

[PDF] arxiv.org

Improving speech translation by fusing speech and text

W Yin, Z Liu, C Zhao, T Wang, J Tong, R Ye - arXiv preprint arXiv …, 2023 - arxiv.org

In speech translation, leveraging multimodal data to improve model performance and
address limitations of individual modalities has shown significant effectiveness. In this paper …

被引用次数：2 相关文章所有 4 个版本

[PDF] arxiv.org

Align, write, re-order: Explainable end-to-end speech translation via operation sequence generation

M Omachi, B Yan, S Dalmia, Y Fujita… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

The black-box nature of end-to-end speech-to-text translation (E2E ST) makes it difficult to
understand how source language inputs are being mapped to the target language. To solve …

被引用次数：3 相关文章所有 5 个版本

[PDF] aclanthology.org

FCGCL: Fine-and coarse-granularity contrastive learning for speech translation

H Zhang, N Si, Y Chen, Z Li, T Niu… - Findings of the …, 2022 - aclanthology.org

It is notoriously difficult to implement end-to-end speech translation (E2E-ST) model
because of the task complexity and data scarcity. Existing techniques often attempt to carry …

被引用次数：2 相关文章

[PDF] aalto.fi

Attention-based End-to-End Models in Language Technology

A Rouhe - 2024 - aaltodoc.aalto.fi

Speech recognition specifically, and language technology more generally, have started to
find everyday use. Challenging language tasks have become feasible through a continued …

被引用次数：1 相关文章所有 2 个版本