Vision-language pre-training: Basics, recent advances, and future trends

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com
This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

Multimodal intelligence: Representation learning, information fusion, and applications

C Zhang, Z Yang, X He, L Deng - IEEE Journal of Selected …, 2020 - ieeexplore.ieee.org
Deep learning methods haverevolutionized speech recognition, image recognition, and
natural language processing since 2010. Each of these tasks involves a single modality in …

Last layer re-training is sufficient for robustness to spurious correlations

P Kirichenko, P Izmailov, AG Wilson - arXiv preprint arXiv:2204.02937, 2022 - arxiv.org
Neural network classifiers can largely rely on simple spurious features, such as
backgrounds, to make predictions. However, even in these cases, we show that they still …

Conceptual 12m: Pushing web-scale image-text pre-training to recognize long-tail visual concepts

S Changpinyo, P Sharma, N Ding… - Proceedings of the …, 2021 - openaccess.thecvf.com
The availability of large-scale image captioning and visual question answering datasets has
contributed significantly to recent successes in vision-and-language pre-training. However …

Discovering invariant rationales for graph neural networks

YX Wu, X Wang, A Zhang, X He, TS Chua - arXiv preprint arXiv …, 2022 - arxiv.org
Intrinsic interpretability of graph neural networks (GNNs) is to find a small subset of the input
graph's features--rationale--which guides the model prediction. Unfortunately, the leading …

Unbiased scene graph generation from biased training

K Tang, Y Niu, J Huang, J Shi… - Proceedings of the …, 2020 - openaccess.thecvf.com
Today's scene graph generation (SGG) task is still far from practical, mainly due to the
severe training bias, eg, collapsing diverse" human walk on/sit on/lay on beach" into" human …

Learning from failure: De-biasing classifier from biased classifier

J Nam, H Cha, S Ahn, J Lee… - Advances in Neural …, 2020 - proceedings.neurips.cc
Neural networks often learn to make predictions that overly rely on spurious corre-lation
existing in the dataset, which causes the model to be biased. While previous work tackles …

Counterfactual vqa: A cause-effect look at language bias

Y Niu, K Tang, H Zhang, Z Lu… - Proceedings of the …, 2021 - openaccess.thecvf.com
Recent VQA models may tend to rely on language bias as a shortcut and thus fail to
sufficiently learn the multi-modal knowledge from both vision and language. In this paper …

Debiasing graph neural networks via learning disentangled causal substructure

S Fan, X Wang, Y Mo, C Shi… - Advances in Neural …, 2022 - proceedings.neurips.cc
Abstract Most Graph Neural Networks (GNNs) predict the labels of unseen graphs by
learning the correlation between the input graphs and labels. However, by presenting a …

Counterfactual samples synthesizing for robust visual question answering

L Chen, X Yan, J Xiao, H Zhang… - Proceedings of the …, 2020 - openaccess.thecvf.com
Abstract Despite Visual Question Answering (VQA) has realized impressive progress over
the last few years, today's VQA models tend to capture superficial linguistic correlations in …