Transformers in speech processing: A survey

S Latif, A Zaidi, H Cuayahuitl, F Shamshad… - arXiv preprint arXiv …, 2023 - arxiv.org
The remarkable success of transformers in the field of natural language processing has
sparked the interest of the speech-processing community, leading to an exploration of their …

Information-transport-based policy for simultaneous translation

S Zhang, Y Feng - arXiv preprint arXiv:2210.12357, 2022 - arxiv.org
Simultaneous translation (ST) outputs translation while receiving the source inputs, and
hence requires a policy to determine whether to translate a target token or wait for the next …

Unified segment-to-segment framework for simultaneous sequence generation

S Zhang, Y Feng - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Simultaneous sequence generation is a pivotal task for real-time scenarios, such as
streaming speech recognition, simultaneous machine translation and simultaneous speech …

End-to-End Speech-to-Text Translation: A Survey

N Sethiya, CK Maurya - arXiv preprint arXiv:2312.01053, 2023 - arxiv.org
Speech-to-text translation pertains to the task of converting speech signals in a language to
text in another language. It finds its application in various domains, such as hands-free …

Attention as a guide for simultaneous speech translation

S Papi, M Negri, M Turchi - arXiv preprint arXiv:2212.07850, 2022 - arxiv.org
The study of the attention mechanism has sparked interest in many fields, such as language
modeling and machine translation. Although its patterns have been exploited to perform …

Over-generation cannot be rewarded: Length-adaptive average lagging for simultaneous speech translation

S Papi, M Gaido, M Negri, M Turchi - arXiv preprint arXiv:2206.05807, 2022 - arxiv.org
Simultaneous speech translation (SimulST) systems aim at generating their output with the
lowest possible latency, which is normally computed in terms of Average Lagging (AL). In …

Learning when to translate for streaming speech

Q Dong, Y Zhu, M Wang, L Li - arXiv preprint arXiv:2109.07368, 2021 - arxiv.org
How to find proper moments to generate partial sentence translation given a streaming
speech input? Existing approaches waiting-and-translating for a fixed duration often break …

Learning adaptive segmentation policy for end-to-end simultaneous translation

R Zhang, Z He, H Wu, H Wang - … of the 60th Annual Meeting of the …, 2022 - aclanthology.org
End-to-end simultaneous speech-to-text translation aims to directly perform translation from
streaming source speech to target text with high translation quality and low latency. A typical …

Alignatt: Using attention-based audio-translation alignments as a guide for simultaneous speech translation

S Papi, M Turchi, M Negri - arXiv preprint arXiv:2305.11408, 2023 - arxiv.org
Attention is the core mechanism of today's most used architectures for natural language
processing and has been analyzed from many perspectives, including its effectiveness for …

Recent Advances in End-to-End Simultaneous Speech Translation

X Liu, G Hu, Y Du, E He, YF Luo, C Xu, T Xiao… - arXiv preprint arXiv …, 2024 - arxiv.org
Simultaneous speech translation (SimulST) is a demanding task that involves generating
translations in real-time while continuously processing speech input. This paper offers a …