A survey of cross-lingual word embedding models
Cross-lingual representations of words enable us to reason about word meaning in
multilingual contexts and are a key facilitator of cross-lingual transfer when developing …
multilingual contexts and are a key facilitator of cross-lingual transfer when developing …
Uc2: Universal cross-lingual cross-modal vision-and-language pre-training
Vision-and-language pre-training has achieved impressive success in learning multimodal
representations between vision and language. To generalize this success to non-English …
representations between vision and language. To generalize this success to non-English …
SemEval-2023 task 1: Visual word sense disambiguation
This paper presents the Visual Word Sense Disambiguation (Visual-WSD) task. The
objective of Visual-WSD is to identify among a set of ten images the one that corresponds to …
objective of Visual-WSD is to identify among a set of ten images the one that corresponds to …
A visual attention grounding neural model for multimodal machine translation
We introduce a novel multimodal machine translation model that utilizes parallel visual and
textual information. Our model jointly optimizes the learning of a shared visual-language …
textual information. Our model jointly optimizes the learning of a shared visual-language …
Image pivoting for learning multilingual multimodal representations
In this paper we propose a model to learn multimodal multilingual representations for
matching images and sentences in different languages, with the aim of advancing …
matching images and sentences in different languages, with the aim of advancing …
Emergent translation in multi-agent communication
While most machine translation systems to date are trained on large parallel corpora,
humans learn language in a different way: by being grounded in an environment and …
humans learn language in a different way: by being grounded in an environment and …
Good for misconceived reasons: An empirical revisiting on the need for visual context in multimodal machine translation
A neural multimodal machine translation (MMT) system is one that aims to perform better
translation by extending conventional text-only translation models with multimodal …
translation by extending conventional text-only translation models with multimodal …
Mule: Multimodal universal language embedding
Existing vision-language methods typically support two languages at a time at most. In this
paper, we present a modular approach which can easily be incorporated into existing vision …
paper, we present a modular approach which can easily be incorporated into existing vision …
Towards zero-shot cross-lingual image retrieval
P Aggarwal, A Kale - arXiv preprint arXiv:2012.05107, 2020 - arxiv.org
There has been a recent spike in interest in multi-modal Language and Vision problems. On
the language side, most of these models primarily focus on English since most multi-modal …
the language side, most of these models primarily focus on English since most multi-modal …
Multi-head attention with diversity for learning grounded multilingual multimodal representations
With the aim of promoting and understanding the multilingual version of image search, we
leverage visual object detection and propose a model with diverse multi-head attention to …
leverage visual object detection and propose a model with diverse multi-head attention to …