Multimodal research in vision and language: A review of current and emerging trends

S Uppal, S Bhagat, D Hazarika, N Majumder, S Poria… - Information …, 2022 - Elsevier
Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …

Deep vision multimodal learning: Methodology, benchmark, and trend

W Chai, G Wang - Applied Sciences, 2022 - mdpi.com
Deep vision multimodal learning aims at combining deep visual representation learning with
other modalities, such as text, sound, and data collected from other sensors. With the fast …

Google's multilingual neural machine translation system: Enabling zero-shot translation

M Johnson, M Schuster, QV Le, M Krikun… - Transactions of the …, 2017 - direct.mit.edu
We propose a simple solution to use a single Neural Machine Translation (NMT) model to
translate between multiple languages. Our solution requires no changes to the model …

Visual pivoting for (unsupervised) entity alignment

F Liu, M Chen, D Roth, N Collier - … of the AAAI conference on artificial …, 2021 - ojs.aaai.org
This work studies the use of visual semantic representations to align entities in
heterogeneous knowledge graphs (KGs). Images are natural components of many existing …

Search engine guided neural machine translation

J Gu, Y Wang, K Cho, VOK Li - Proceedings of the AAAI Conference on …, 2018 - ojs.aaai.org
In this paper, we extend an attention-based neural machine translation (NMT) model by
allowing it to access an entire training set of parallel sentence pairs even after training. The …

Zero-resource translation with multi-lingual neural machine translation

O Firat, B Sankaran, Y Al-Onaizan, FTY Vural… - arXiv preprint arXiv …, 2016 - arxiv.org
In this paper, we propose a novel finetuning algorithm for the recently introduced multi-way,
mulitlingual neural machine translate that enables zero-resource machine translation. When …

[PDF][PDF] A shared task on multimodal machine translation and crosslingual image description

L Specia, S Frank, K Sima'An… - Proceedings of the First …, 2016 - aclanthology.org
This paper introduces and summarises the findings of a new shared task at the intersection
of Natural Language Processing and Computer Vision: the generation of image descriptions …

Doubly-attentive decoder for multi-modal neural machine translation

I Calixto, Q Liu, N Campbell - arXiv preprint arXiv:1702.01287, 2017 - arxiv.org
We introduce a Multi-modal Neural Machine Translation model in which a doubly-attentive
decoder naturally incorporates spatial visual features obtained using pre-trained …

Attention strategies for multi-source sequence-to-sequence learning

J Libovický, J Helcl - arXiv preprint arXiv:1704.06567, 2017 - arxiv.org
Modeling attention in neural multi-source sequence-to-sequence learning remains a
relatively unexplored area, despite its usefulness in tasks that incorporate multiple source …

Contextual parameter generation for universal neural machine translation

EA Platanios, M Sachan, G Neubig… - arXiv preprint arXiv …, 2018 - arxiv.org
We propose a simple modification to existing neural machine translation (NMT) models that
enables using a single universal model to translate between multiple languages while …