Findings of the IWSLT 2022 Evaluation Campaign.

A Anastasopoulos, L Barrault, L Bentivogli… - Proceedings of the 19th …, 2022 - cris.fbk.eu
The evaluation campaign of the 19th International Conference on Spoken Language
Translation featured eight shared tasks:(i) Simultaneous speech translation,(ii) Offline …

The multilingual tedx corpus for speech recognition and translation

E Salesky, M Wiesner, J Bremerman, R Cattoni… - arXiv preprint arXiv …, 2021 - arxiv.org
We present the Multilingual TEDx corpus, built to support speech recognition (ASR) and
speech translation (ST) research across many non-English source languages. The corpus is …

Understanding the brain with attention: A survey of transformers in brain sciences

C Chen, H Wang, Y Chen, Z Yin, X Yang, H Ning… - Brain‐X, 2023 - Wiley Online Library
Owing to their superior capabilities and advanced achievements, Transformers have
gradually attracted attention with regard to understanding complex brain processing …

Adaptive multilingual speech recognition with pretrained models

NQ Pham, A Waibel, J Niehues - arXiv preprint arXiv:2205.12304, 2022 - arxiv.org
Multilingual speech recognition with supervised learning has achieved great results as
reflected in recent research. With the development of pretraining methods on audio and text …

AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation

J Choi, SJ Park, M Kim, YM Ro - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
This paper proposes a novel direct Audio-Visual Speech to Audio-Visual Speech
Translation (AV2AV) framework where the input and output of the system are multimodal (ie …

From block-Toeplitz matrices to differential equations on graphs: towards a general theory for scalable masked Transformers

K Choromanski, H Lin, H Chen… - International …, 2022 - proceedings.mlr.press
In this paper we provide, to the best of our knowledge, the first comprehensive approach for
incorporating various masking mechanisms into Transformers architectures in a scalable …

Incorporating relative position information in transformer-based sign language recognition and translation

N Aloysius, M Geetha, P Nedungadi - IEEE Access, 2021 - ieeexplore.ieee.org
Recent advancements in machine translation tasks, with the advent of attention mechanisms
and Transformer networks, have accelerated the research in Sign Language Translation …

[HTML][HTML] A reverse positional encoding multi-head attention-based neural machine translation model for arabic dialects

LH Baniata, S Kang, IKE Ampomah - Mathematics, 2022 - mdpi.com
Languages with a grammatical structure that have a free order for words, such as Arabic
dialects, are considered a challenge for neural machine translation (NMT) models because …

ESPnet-ST IWSLT 2021 offline speech translation system

H Inaguma, B Yan, S Dalmia, P Guo, J Shi… - arXiv preprint arXiv …, 2021 - arxiv.org
This paper describes the ESPnet-ST group's IWSLT 2021 submission in the offline speech
translation track. This year we made various efforts on training data, architecture, and audio …

Variable attention masking for configurable transformer transducer speech recognition

P Swietojanski, S Braun, D Can… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
This work studies the use of attention masking in transformer transducer based speech
recognition for building a single configurable model for different deployment scenarios. We …