Rubi: Reducing unimodal biases for visual question answering

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

被引用次数：144 相关文章所有 7 个版本

[PDF] arxiv.org

Multimodal intelligence: Representation learning, information fusion, and applications

C Zhang, Z Yang, X He, L Deng - IEEE Journal of Selected …, 2020 - ieeexplore.ieee.org

Deep learning methods haverevolutionized speech recognition, image recognition, and
natural language processing since 2010. Each of these tasks involves a single modality in …

被引用次数：353 相关文章所有 3 个版本

[PDF] arxiv.org

Last layer re-training is sufficient for robustness to spurious correlations

P Kirichenko, P Izmailov, AG Wilson - arXiv preprint arXiv:2204.02937, 2022 - arxiv.org

Neural network classifiers can largely rely on simple spurious features, such as
backgrounds, to make predictions. However, even in these cases, we show that they still …

被引用次数：219 相关文章所有 6 个版本

[PDF] thecvf.com

Conceptual 12m: Pushing web-scale image-text pre-training to recognize long-tail visual concepts

S Changpinyo, P Sharma, N Ding… - Proceedings of the …, 2021 - openaccess.thecvf.com

The availability of large-scale image captioning and visual question answering datasets has
contributed significantly to recent successes in vision-and-language pre-training. However …

被引用次数：793 相关文章所有 9 个版本

[PDF] arxiv.org

Discovering invariant rationales for graph neural networks

YX Wu, X Wang, A Zhang, X He, TS Chua - arXiv preprint arXiv …, 2022 - arxiv.org

Intrinsic interpretability of graph neural networks (GNNs) is to find a small subset of the input
graph's features--rationale--which guides the model prediction. Unfortunately, the leading …

被引用次数：213 相关文章所有 5 个版本

[PDF] thecvf.com

Unbiased scene graph generation from biased training

K Tang, Y Niu, J Huang, J Shi… - Proceedings of the …, 2020 - openaccess.thecvf.com

Today's scene graph generation (SGG) task is still far from practical, mainly due to the
severe training bias, eg, collapsing diverse" human walk on/sit on/lay on beach" into" human …

被引用次数：705 相关文章所有 10 个版本

[PDF] neurips.cc

Learning from failure: De-biasing classifier from biased classifier

J Nam, H Cha, S Ahn, J Lee… - Advances in Neural …, 2020 - proceedings.neurips.cc

Neural networks often learn to make predictions that overly rely on spurious corre-lation
existing in the dataset, which causes the model to be biased. While previous work tackles …

被引用次数：375 相关文章所有 7 个版本

[PDF] thecvf.com

Counterfactual vqa: A cause-effect look at language bias

Y Niu, K Tang, H Zhang, Z Lu… - Proceedings of the …, 2021 - openaccess.thecvf.com

Recent VQA models may tend to rely on language bias as a shortcut and thus fail to
sufficiently learn the multi-modal knowledge from both vision and language. In this paper …

被引用次数：387 相关文章所有 7 个版本

[PDF] neurips.cc

Debiasing graph neural networks via learning disentangled causal substructure

S Fan, X Wang, Y Mo, C Shi… - Advances in Neural …, 2022 - proceedings.neurips.cc

Abstract Most Graph Neural Networks (GNNs) predict the labels of unseen graphs by
learning the correlation between the input graphs and labels. However, by presenting a …

被引用次数：59 相关文章所有 6 个版本

[PDF] thecvf.com

Counterfactual samples synthesizing for robust visual question answering

L Chen, X Yan, J Xiao, H Zhang… - Proceedings of the …, 2020 - openaccess.thecvf.com

Abstract Despite Visual Question Answering (VQA) has realized impressive progress over
the last few years, today's VQA models tend to capture superficial linguistic correlations in …

被引用次数：324 相关文章所有 8 个版本