Direct simultaneous speech-to-text translation assisted by synchronized streaming ASR

R Zheng, J Chen, M Ma… - … Conference on Machine …, 2021 - proceedings.mlr.press

Recently, representation learning for text and speech has successfully improved many
language related tasks. However, all existing methods suffer from two limitations:(a) they …

被引用次数：65 相关文章所有 4 个版本

[PDF] neurips.cc

Unified segment-to-segment framework for simultaneous sequence generation

S Zhang, Y Feng - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Simultaneous sequence generation is a pivotal task for real-time scenarios, such as
streaming speech recognition, simultaneous machine translation and simultaneous speech …

被引用次数：3 相关文章所有 5 个版本

End-to-End Speech-to-Text Translation: A Survey

N Sethiya, CK Maurya - arXiv preprint arXiv:2312.01053, 2023 - arxiv.org

Speech-to-text translation pertains to the task of converting speech signals in a language to
text in another language. It finds its application in various domains, such as hands-free …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

ESPnet-ST-v2: Multipurpose spoken language translation toolkit

B Yan, J Shi, Y Tang, H Inaguma, Y Peng… - arXiv preprint arXiv …, 2023 - arxiv.org

ESPnet-ST-v2 is a revamp of the open-source ESPnet-ST toolkit necessitated by the
broadening interests of the spoken language translation community. ESPnet-ST-v2 supports …

被引用次数：8 相关文章所有 7 个版本

[PDF] arxiv.org

Attention as a guide for simultaneous speech translation

S Papi, M Negri, M Turchi - arXiv preprint arXiv:2212.07850, 2022 - arxiv.org

The study of the attention mechanism has sparked interest in many fields, such as language
modeling and machine translation. Although its patterns have been exploited to perform …

被引用次数：17 相关文章所有 6 个版本

[PDF] arxiv.org

Over-generation cannot be rewarded: Length-adaptive average lagging for simultaneous speech translation

S Papi, M Gaido, M Negri, M Turchi - arXiv preprint arXiv:2206.05807, 2022 - arxiv.org

Simultaneous speech translation (SimulST) systems aim at generating their output with the
lowest possible latency, which is normally computed in terms of Average Lagging (AL). In …

被引用次数：23 相关文章所有 10 个版本

[PDF] arxiv.org

Learning when to translate for streaming speech

Q Dong, Y Zhu, M Wang, L Li - arXiv preprint arXiv:2109.07368, 2021 - arxiv.org

How to find proper moments to generate partial sentence translation given a streaming
speech input? Existing approaches waiting-and-translating for a fixed duration often break …

被引用次数：22 相关文章所有 8 个版本

A roadmap for big model

S Yuan, H Zhao, S Zhao, J Leng, Y Liang… - arXiv preprint arXiv …, 2022 - arxiv.org

With the rapid development of deep learning, training Big Models (BMs) for multiple
downstream tasks becomes a popular paradigm. Researchers have achieved various …

被引用次数：20 相关文章所有 2 个版本

[PDF] aclanthology.org

Learning adaptive segmentation policy for end-to-end simultaneous translation

R Zhang, Z He, H Wu, H Wang - … of the 60th Annual Meeting of the …, 2022 - aclanthology.org

End-to-end simultaneous speech-to-text translation aims to directly perform translation from
streaming source speech to target text with high translation quality and low latency. A typical …

被引用次数：13 相关文章所有 2 个版本

[PDF] arxiv.org

Token-level serialized output training for joint streaming asr and st leveraging textual alignments

S Papi, P Wang, J Chen, J Xue, J Li… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

In real-world applications, users often require both translations and transcriptions of speech
to enhance their comprehension, particularly in streaming scenarios where incremental …

被引用次数：6 相关文章所有 5 个版本