Multimodal research in vision and language: A review of current and emerging trends
Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …
with a diverse range of modalities present in the real-world data. More recently, this has …
Multimodal few-shot learning with frozen language models
M Tsimpoukelli, JL Menick, S Cabi… - Advances in …, 2021 - proceedings.neurips.cc
When trained at sufficient scale, auto-regressive language models exhibit the notable ability
to learn a new language task after being prompted with just a few examples. Here, we …
to learn a new language task after being prompted with just a few examples. Here, we …
Docformer: End-to-end transformer for document understanding
We present DocFormer-a multi-modal transformer based architecture for the task of Visual
Document Understanding (VDU). VDU is a challenging problem which aims to understand …
Document Understanding (VDU). VDU is a challenging problem which aims to understand …
Are multimodal transformers robust to missing modality?
Multimodal data collected from the real world are often imperfect due to missing modalities.
Therefore multimodal models that are robust against modal-incomplete data are highly …
Therefore multimodal models that are robust against modal-incomplete data are highly …
Frozen pretrained transformers as universal computation engines
We investigate the capability of a transformer pretrained on natural language to generalize
to other modalities with minimal finetuning--in particular, without finetuning of the self …
to other modalities with minimal finetuning--in particular, without finetuning of the self …
Trusted multi-view classification with dynamic evidential fusion
Existing multi-view classification algorithms focus on promoting accuracy by exploiting
different views, typically integrating them into common representations for follow-up tasks …
different views, typically integrating them into common representations for follow-up tasks …
Misa: Modality-invariant and-specific representations for multimodal sentiment analysis
Multimodal Sentiment Analysis is an active area of research that leverages multimodal
signals for affective understanding of user-generated videos. The predominant approach …
signals for affective understanding of user-generated videos. The predominant approach …
The hateful memes challenge: Detecting hate speech in multimodal memes
This work proposes a new challenge set for multimodal classification, focusing on detecting
hate speech in multimodal memes. It is constructed such that unimodal models struggle and …
hate speech in multimodal memes. It is constructed such that unimodal models struggle and …
Efficient multimodal fusion via interactive prompting
Large-scale pre-training has brought unimodal fields such as computer vision and natural
language processing to a new era. Following this trend, the size of multimodal learning …
language processing to a new era. Following this trend, the size of multimodal learning …
[HTML][HTML] Multibench: Multiscale benchmarks for multimodal representation learning
Learning multimodal representations involves integrating information from multiple
heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world …
heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world …