Vatex: A large-scale, high-quality multilingual dataset for video-and-language research
We present a new large-scale multilingual video description dataset, VATEX, which contains
over 41,250 videos and 825,000 captions in both English and Chinese. Among the captions …
over 41,250 videos and 825,000 captions in both English and Chinese. Among the captions …
Findings of the second shared task on multimodal machine translation and multilingual image description
We present the results from the second shared task on multimodal machine translation and
multilingual image description. Nine teams submitted 19 systems to two tasks. The …
multilingual image description. Nine teams submitted 19 systems to two tasks. The …
A novel graph-based multi-modal fusion encoder for neural machine translation
Multi-modal neural machine translation (NMT) aims to translate source sentences into a
target language paired with images. However, dominant multi-modal NMT models do not …
target language paired with images. However, dominant multi-modal NMT models do not …
Trends in integration of vision and language research: A survey of tasks, datasets, and methods
A Mogadala, M Kalimuthu, D Klakow - Journal of Artificial Intelligence …, 2021 - jair.org
Abstract Interest in Artificial Intelligence (AI) and its applications has seen unprecedented
growth in the last few years. This success can be partly attributed to the advancements made …
growth in the last few years. This success can be partly attributed to the advancements made …
Probing the need for visual context in multimodal machine translation
O Caglayan, P Madhyastha, L Specia… - arXiv preprint arXiv …, 2019 - arxiv.org
Current work on multimodal machine translation (MMT) has suggested that the visual
modality is either unnecessary or only marginally beneficial. We posit that this is a …
modality is either unnecessary or only marginally beneficial. We posit that this is a …
Neural machine translation with universal visual representation
Though visual information has been introduced for enhancing neural machine translation
(NMT), its effectiveness strongly relies on the availability of large amounts of bilingual …
(NMT), its effectiveness strongly relies on the availability of large amounts of bilingual …
Dynamic context-guided capsule network for multimodal machine translation
Multimodal machine translation (MMT), which mainly focuses on enhancing text-only
translation with visual features, has attracted considerable attention from both computer …
translation with visual features, has attracted considerable attention from both computer …
Neural machine translation with phrase-level universal visual representations
Q Fang, Y Feng - arXiv preprint arXiv:2203.10299, 2022 - arxiv.org
Multimodal machine translation (MMT) aims to improve neural machine translation (NMT)
with additional visual information, but most existing MMT methods require paired input of …
with additional visual information, but most existing MMT methods require paired input of …
Distilling translations with visual awareness
J Ive, P Madhyastha, L Specia - arXiv preprint arXiv:1906.07701, 2019 - arxiv.org
Previous work on multimodal machine translation has shown that visual information is only
needed in very specific cases, for example in the presence of ambiguous words where the …
needed in very specific cases, for example in the presence of ambiguous words where the …
Multimodal machine translation through visuals and speech
Multimodal machine translation involves drawing information from more than one modality,
based on the assumption that the additional modalities will contain useful alternative views …
based on the assumption that the additional modalities will contain useful alternative views …