Multimodal research in vision and language: A review of current and emerging trends

S Uppal, S Bhagat, D Hazarika, N Majumder, S Poria… - Information …, 2022 - Elsevier
Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …

A systematic literature review on multimodal machine learning: Applications, challenges, gaps and future directions

A Barua, MU Ahmed, S Begum - IEEE Access, 2023 - ieeexplore.ieee.org
Multimodal machine learning (MML) is a tempting multidisciplinary research area where
heterogeneous data from multiple modalities and machine learning (ML) are combined to …

Vatex: A large-scale, high-quality multilingual dataset for video-and-language research

X Wang, J Wu, J Chen, L Li… - Proceedings of the …, 2019 - openaccess.thecvf.com
We present a new large-scale multilingual video description dataset, VATEX, which contains
over 41,250 videos and 825,000 captions in both English and Chinese. Among the captions …

Learning multi-view aggregation in the wild for large-scale 3d semantic segmentation

D Robert, B Vallet, L Landrieu - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Recent works on 3D semantic segmentation propose to exploit the synergy between images
and point clouds by processing each modality with a dedicated network and projecting …

Findings of the second shared task on multimodal machine translation and multilingual image description

D Elliott, S Frank, L Barrault, F Bougares… - arXiv preprint arXiv …, 2017 - arxiv.org
We present the results from the second shared task on multimodal machine translation and
multilingual image description. Nine teams submitted 19 systems to two tasks. The …

A novel graph-based multi-modal fusion encoder for neural machine translation

Y Yin, F Meng, J Su, C Zhou, Z Yang, J Zhou… - arXiv preprint arXiv …, 2020 - arxiv.org
Multi-modal neural machine translation (NMT) aims to translate source sentences into a
target language paired with images. However, dominant multi-modal NMT models do not …

Probing the need for visual context in multimodal machine translation

O Caglayan, P Madhyastha, L Specia… - arXiv preprint arXiv …, 2019 - arxiv.org
Current work on multimodal machine translation (MMT) has suggested that the visual
modality is either unnecessary or only marginally beneficial. We posit that this is a …

Findings of the third shared task on multimodal machine translation

L Barrault, F Bougares, L Specia, C Lala… - Third Conference on …, 2018 - hal.science
We present the results from the third shared task on multimodal machine translation. In this
task a source sentence in English is supplemented by an image and participating systems …

Doubly-attentive decoder for multi-modal neural machine translation

I Calixto, Q Liu, N Campbell - arXiv preprint arXiv:1702.01287, 2017 - arxiv.org
We introduce a Multi-modal Neural Machine Translation model in which a doubly-attentive
decoder naturally incorporates spatial visual features obtained using pre-trained …

Evaluating discourse phenomena in neural machine translation

R Bawden, R Sennrich, A Birch, B Haddow - arXiv preprint arXiv …, 2017 - arxiv.org
For machine translation to tackle discourse phenomena, models must have access to extra-
sentential linguistic context. There has been recent interest in modelling context in neural …