Vision-language pre-training: Basics, recent advances, and future trends
This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …
intelligence that have been developed in the last few years. We group these approaches …
Multimodal intelligence: Representation learning, information fusion, and applications
Deep learning methods haverevolutionized speech recognition, image recognition, and
natural language processing since 2010. Each of these tasks involves a single modality in …
natural language processing since 2010. Each of these tasks involves a single modality in …
Last layer re-training is sufficient for robustness to spurious correlations
Neural network classifiers can largely rely on simple spurious features, such as
backgrounds, to make predictions. However, even in these cases, we show that they still …
backgrounds, to make predictions. However, even in these cases, we show that they still …
Conceptual 12m: Pushing web-scale image-text pre-training to recognize long-tail visual concepts
The availability of large-scale image captioning and visual question answering datasets has
contributed significantly to recent successes in vision-and-language pre-training. However …
contributed significantly to recent successes in vision-and-language pre-training. However …
Discovering invariant rationales for graph neural networks
Intrinsic interpretability of graph neural networks (GNNs) is to find a small subset of the input
graph's features--rationale--which guides the model prediction. Unfortunately, the leading …
graph's features--rationale--which guides the model prediction. Unfortunately, the leading …
Unbiased scene graph generation from biased training
Today's scene graph generation (SGG) task is still far from practical, mainly due to the
severe training bias, eg, collapsing diverse" human walk on/sit on/lay on beach" into" human …
severe training bias, eg, collapsing diverse" human walk on/sit on/lay on beach" into" human …
Learning from failure: De-biasing classifier from biased classifier
Neural networks often learn to make predictions that overly rely on spurious corre-lation
existing in the dataset, which causes the model to be biased. While previous work tackles …
existing in the dataset, which causes the model to be biased. While previous work tackles …
Counterfactual vqa: A cause-effect look at language bias
Recent VQA models may tend to rely on language bias as a shortcut and thus fail to
sufficiently learn the multi-modal knowledge from both vision and language. In this paper …
sufficiently learn the multi-modal knowledge from both vision and language. In this paper …
Debiasing graph neural networks via learning disentangled causal substructure
Abstract Most Graph Neural Networks (GNNs) predict the labels of unseen graphs by
learning the correlation between the input graphs and labels. However, by presenting a …
learning the correlation between the input graphs and labels. However, by presenting a …
Counterfactual samples synthesizing for robust visual question answering
Abstract Despite Visual Question Answering (VQA) has realized impressive progress over
the last few years, today's VQA models tend to capture superficial linguistic correlations in …
the last few years, today's VQA models tend to capture superficial linguistic correlations in …