Multimodal research in vision and language: A review of current and emerging trends
Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …
with a diverse range of modalities present in the real-world data. More recently, this has …
Vatex: A large-scale, high-quality multilingual dataset for video-and-language research
We present a new large-scale multilingual video description dataset, VATEX, which contains
over 41,250 videos and 825,000 captions in both English and Chinese. Among the captions …
over 41,250 videos and 825,000 captions in both English and Chinese. Among the captions …
Deep vision multimodal learning: Methodology, benchmark, and trend
Deep vision multimodal learning aims at combining deep visual representation learning with
other modalities, such as text, sound, and data collected from other sensors. With the fast …
other modalities, such as text, sound, and data collected from other sensors. With the fast …
Visual pivoting for (unsupervised) entity alignment
This work studies the use of visual semantic representations to align entities in
heterogeneous knowledge graphs (KGs). Images are natural components of many existing …
heterogeneous knowledge graphs (KGs). Images are natural components of many existing …
Multimodal transformer for multimodal machine translation
Abstract Multimodal Machine Translation (MMT) aims to introduce information from other
modality, generally static images, to improve the translation quality. Previous works propose …
modality, generally static images, to improve the translation quality. Previous works propose …
Findings of the second shared task on multimodal machine translation and multilingual image description
We present the results from the second shared task on multimodal machine translation and
multilingual image description. Nine teams submitted 19 systems to two tasks. The …
multilingual image description. Nine teams submitted 19 systems to two tasks. The …
A novel graph-based multi-modal fusion encoder for neural machine translation
Multi-modal neural machine translation (NMT) aims to translate source sentences into a
target language paired with images. However, dominant multi-modal NMT models do not …
target language paired with images. However, dominant multi-modal NMT models do not …
Trends in integration of vision and language research: A survey of tasks, datasets, and methods
A Mogadala, M Kalimuthu, D Klakow - Journal of Artificial Intelligence …, 2021 - jair.org
Abstract Interest in Artificial Intelligence (AI) and its applications has seen unprecedented
growth in the last few years. This success can be partly attributed to the advancements made …
growth in the last few years. This success can be partly attributed to the advancements made …
Probing the need for visual context in multimodal machine translation
O Caglayan, P Madhyastha, L Specia… - arXiv preprint arXiv …, 2019 - arxiv.org
Current work on multimodal machine translation (MMT) has suggested that the visual
modality is either unnecessary or only marginally beneficial. We posit that this is a …
modality is either unnecessary or only marginally beneficial. We posit that this is a …
Uc2: Universal cross-lingual cross-modal vision-and-language pre-training
Vision-and-language pre-training has achieved impressive success in learning multimodal
representations between vision and language. To generalize this success to non-English …
representations between vision and language. To generalize this success to non-English …