A survey of cross-lingual word embedding models

S Ruder, I Vulić, A Søgaard - Journal of Artificial Intelligence Research, 2019 - jair.org
Cross-lingual representations of words enable us to reason about word meaning in
multilingual contexts and are a key facilitator of cross-lingual transfer when developing …

Uc2: Universal cross-lingual cross-modal vision-and-language pre-training

M Zhou, L Zhou, S Wang, Y Cheng… - Proceedings of the …, 2021 - openaccess.thecvf.com
Vision-and-language pre-training has achieved impressive success in learning multimodal
representations between vision and language. To generalize this success to non-English …

SemEval-2023 task 1: Visual word sense disambiguation

A Raganato, I Calixto, A Ushio… - … 2023-Proceedings of …, 2023 - boa.unimib.it
This paper presents the Visual Word Sense Disambiguation (Visual-WSD) task. The
objective of Visual-WSD is to identify among a set of ten images the one that corresponds to …

A visual attention grounding neural model for multimodal machine translation

M Zhou, R Cheng, YJ Lee, Z Yu - arXiv preprint arXiv:1808.08266, 2018 - arxiv.org
We introduce a novel multimodal machine translation model that utilizes parallel visual and
textual information. Our model jointly optimizes the learning of a shared visual-language …

Image pivoting for learning multilingual multimodal representations

S Gella, R Sennrich, F Keller, M Lapata - arXiv preprint arXiv:1707.07601, 2017 - arxiv.org
In this paper we propose a model to learn multimodal multilingual representations for
matching images and sentences in different languages, with the aim of advancing …

Emergent translation in multi-agent communication

J Lee, K Cho, J Weston, D Kiela - arXiv preprint arXiv:1710.06922, 2017 - arxiv.org
While most machine translation systems to date are trained on large parallel corpora,
humans learn language in a different way: by being grounded in an environment and …

Good for misconceived reasons: An empirical revisiting on the need for visual context in multimodal machine translation

Z Wu, L Kong, W Bi, X Li, B Kao - arXiv preprint arXiv:2105.14462, 2021 - arxiv.org
A neural multimodal machine translation (MMT) system is one that aims to perform better
translation by extending conventional text-only translation models with multimodal …

Mule: Multimodal universal language embedding

D Kim, K Saito, K Saenko, S Sclaroff… - Proceedings of the AAAI …, 2020 - ojs.aaai.org
Existing vision-language methods typically support two languages at a time at most. In this
paper, we present a modular approach which can easily be incorporated into existing vision …

Towards zero-shot cross-lingual image retrieval

P Aggarwal, A Kale - arXiv preprint arXiv:2012.05107, 2020 - arxiv.org
There has been a recent spike in interest in multi-modal Language and Vision problems. On
the language side, most of these models primarily focus on English since most multi-modal …

Multi-head attention with diversity for learning grounded multilingual multimodal representations

PY Huang, X Chang, A Hauptmann - arXiv preprint arXiv:1910.00058, 2019 - arxiv.org
With the aim of promoting and understanding the multilingual version of image search, we
leverage visual object detection and propose a model with diverse multi-head attention to …