Multimodal research in vision and language: A review of current and emerging trends

S Uppal, S Bhagat, D Hazarika, N Majumder, S Poria… - Information …, 2022 - Elsevier
Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …

Multimodal few-shot learning with frozen language models

M Tsimpoukelli, JL Menick, S Cabi… - Advances in …, 2021 - proceedings.neurips.cc
When trained at sufficient scale, auto-regressive language models exhibit the notable ability
to learn a new language task after being prompted with just a few examples. Here, we …

Docformer: End-to-end transformer for document understanding

S Appalaraju, B Jasani, BU Kota… - Proceedings of the …, 2021 - openaccess.thecvf.com
We present DocFormer-a multi-modal transformer based architecture for the task of Visual
Document Understanding (VDU). VDU is a challenging problem which aims to understand …

Are multimodal transformers robust to missing modality?

M Ma, J Ren, L Zhao, D Testuggine… - Proceedings of the …, 2022 - openaccess.thecvf.com
Multimodal data collected from the real world are often imperfect due to missing modalities.
Therefore multimodal models that are robust against modal-incomplete data are highly …

Frozen pretrained transformers as universal computation engines

K Lu, A Grover, P Abbeel, I Mordatch - Proceedings of the AAAI …, 2022 - ojs.aaai.org
We investigate the capability of a transformer pretrained on natural language to generalize
to other modalities with minimal finetuning--in particular, without finetuning of the self …

Trusted multi-view classification with dynamic evidential fusion

Z Han, C Zhang, H Fu, JT Zhou - IEEE transactions on pattern …, 2022 - ieeexplore.ieee.org
Existing multi-view classification algorithms focus on promoting accuracy by exploiting
different views, typically integrating them into common representations for follow-up tasks …

Misa: Modality-invariant and-specific representations for multimodal sentiment analysis

D Hazarika, R Zimmermann, S Poria - Proceedings of the 28th ACM …, 2020 - dl.acm.org
Multimodal Sentiment Analysis is an active area of research that leverages multimodal
signals for affective understanding of user-generated videos. The predominant approach …

The hateful memes challenge: Detecting hate speech in multimodal memes

D Kiela, H Firooz, A Mohan… - Advances in neural …, 2020 - proceedings.neurips.cc
This work proposes a new challenge set for multimodal classification, focusing on detecting
hate speech in multimodal memes. It is constructed such that unimodal models struggle and …

Efficient multimodal fusion via interactive prompting

Y Li, R Quan, L Zhu, Y Yang - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Large-scale pre-training has brought unimodal fields such as computer vision and natural
language processing to a new era. Following this trend, the size of multimodal learning …

[HTML][HTML] Multibench: Multiscale benchmarks for multimodal representation learning

PP Liang, Y Lyu, X Fan, Z Wu, Y Cheng… - Advances in neural …, 2021 - ncbi.nlm.nih.gov
Learning multimodal representations involves integrating information from multiple
heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world …