How2: a large-scale dataset for multimodal language understanding

R Sanabria, O Caglayan, S Palaskar, D Elliott… - arXiv preprint arXiv …, 2018 - arxiv.org
In this paper, we introduce How2, a multimodal collection of instructional videos with English
subtitles and crowdsourced Portuguese translations. We also present integrated sequence …

Probing the need for visual context in multimodal machine translation

O Caglayan, P Madhyastha, L Specia… - arXiv preprint arXiv …, 2019 - arxiv.org
Current work on multimodal machine translation (MMT) has suggested that the visual
modality is either unnecessary or only marginally beneficial. We posit that this is a …

Multimodal abstractive summarization for how2 videos

S Palaskar, J Libovický, S Gella, F Metze - arXiv preprint arXiv:1906.07901, 2019 - arxiv.org
In this paper, we study abstractive summarization for open-domain videos. Unlike the
traditional text news summarization, the goal is less to" compress" text information but rather …

Unsupervised multimodal machine translation for low-resource distant language pairs

T Tayir, L Li - ACM Transactions on Asian and Low-Resource …, 2024 - dl.acm.org
Unsupervised machine translation (UMT) has recently attracted more attention from
researchers, enabling models to translate when languages lack parallel corpora. However …

Multimodal machine translation through visuals and speech

U Sulubacak, O Caglayan, SA Grönroos, A Rouhe… - Machine …, 2020 - Springer
Multimodal machine translation involves drawing information from more than one modality,
based on the assumption that the additional modalities will contain useful alternative views …

LIUM-CVC submissions for WMT17 multimodal translation task

O Caglayan, W Aransa, A Bardet… - arXiv preprint arXiv …, 2017 - arxiv.org
This paper describes the monomodal and multimodal Neural Machine Translation systems
developed by LIUM and CVC for WMT17 Shared Task on Multimodal Translation. We mainly …

Region-attentive multimodal neural machine translation

Y Zhao, M Komachi, T Kajiwara, C Chu - Neurocomputing, 2022 - Elsevier
We propose a multimodal neural machine translation (MNMT) method with semantic image
regions called region-attentive multimodal neural machine translation (RA-NMT). Existing …

Intelligent system for English translation using automated knowledge base

S Bi - Journal of Intelligent & Fuzzy Systems, 2020 - content.iospress.com
In the process of globalization, machine translation has undergone a long period of
evolution and development. Although the development level of machine translation has …

Evaluating the morphological competence of machine translation systems

F Burlot, F Yvon - 2nd Conference on Machine Translation (WMT17), 2017 - hal.science
While recent changes in Machine Translation state-of-the-art brought translation quality a
step further, it is regularly acknowledged that the standard automatic metrics do not provide …

MAST: Multimodal abstractive summarization with trimodal hierarchical attention

A Khullar, U Arora - arXiv preprint arXiv:2010.08021, 2020 - arxiv.org
This paper presents MAST, a new model for Multimodal Abstractive Text Summarization that
utilizes information from all three modalities--text, audio and video--in a multimodal video …