Multimodal research in vision and language: A review of current and emerging trends
Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …
with a diverse range of modalities present in the real-world data. More recently, this has …
A systematic literature review on multimodal machine learning: Applications, challenges, gaps and future directions
Multimodal machine learning (MML) is a tempting multidisciplinary research area where
heterogeneous data from multiple modalities and machine learning (ML) are combined to …
heterogeneous data from multiple modalities and machine learning (ML) are combined to …
Vatex: A large-scale, high-quality multilingual dataset for video-and-language research
We present a new large-scale multilingual video description dataset, VATEX, which contains
over 41,250 videos and 825,000 captions in both English and Chinese. Among the captions …
over 41,250 videos and 825,000 captions in both English and Chinese. Among the captions …
Learning multi-view aggregation in the wild for large-scale 3d semantic segmentation
Recent works on 3D semantic segmentation propose to exploit the synergy between images
and point clouds by processing each modality with a dedicated network and projecting …
and point clouds by processing each modality with a dedicated network and projecting …
Findings of the second shared task on multimodal machine translation and multilingual image description
We present the results from the second shared task on multimodal machine translation and
multilingual image description. Nine teams submitted 19 systems to two tasks. The …
multilingual image description. Nine teams submitted 19 systems to two tasks. The …
A novel graph-based multi-modal fusion encoder for neural machine translation
Multi-modal neural machine translation (NMT) aims to translate source sentences into a
target language paired with images. However, dominant multi-modal NMT models do not …
target language paired with images. However, dominant multi-modal NMT models do not …
Probing the need for visual context in multimodal machine translation
O Caglayan, P Madhyastha, L Specia… - arXiv preprint arXiv …, 2019 - arxiv.org
Current work on multimodal machine translation (MMT) has suggested that the visual
modality is either unnecessary or only marginally beneficial. We posit that this is a …
modality is either unnecessary or only marginally beneficial. We posit that this is a …
Findings of the third shared task on multimodal machine translation
We present the results from the third shared task on multimodal machine translation. In this
task a source sentence in English is supplemented by an image and participating systems …
task a source sentence in English is supplemented by an image and participating systems …
Doubly-attentive decoder for multi-modal neural machine translation
We introduce a Multi-modal Neural Machine Translation model in which a doubly-attentive
decoder naturally incorporates spatial visual features obtained using pre-trained …
decoder naturally incorporates spatial visual features obtained using pre-trained …
Evaluating discourse phenomena in neural machine translation
For machine translation to tackle discourse phenomena, models must have access to extra-
sentential linguistic context. There has been recent interest in modelling context in neural …
sentential linguistic context. There has been recent interest in modelling context in neural …