Digging out discrimination information from generated samples for robust visual question answering

Z Wen, Y Wang, M Tan, Q Wu, Q Wu - Findings of the Association …, 2023 - aclanthology.org
Abstract Visual Question Answering (VQA) aims to answer a textual question based on a
given image. Nevertheless, recent studies have shown that VQA models tend to capture the …

Causal Reasoning through Two Cognition Layers for Improving Generalization in Visual Question Answering

T Nguyen, N Okazaki - Proceedings of the 2023 Conference on …, 2023 - aclanthology.org
Abstract Generalization in Visual Question Answering (VQA) requires models to answer
questions about images with contexts beyond the training distribution. Existing attempts …

Robust visual question answering: Datasets, methods, and future challenges

J Ma, P Wang, D Kong, Z Wang, J Liu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Visual question answering requires a system to provide an accurate natural language
answer given an image and a natural language question. However, it is widely recognized …

Object Attribute Matters in Visual Question Answering

P Li, Q Si, P Fu, Z Lin, Y Wang - … of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org
Visual question answering is a multimodal task that requires the joint comprehension of
visual and textual information. However, integrating visual and textual semantics solely …

Balancing and contrasting biased samples for debiased visual question answering

R Cao, Z Li - 2023 IEEE international conference on data …, 2023 - ieeexplore.ieee.org
The goal of Visual Question Answering (VQA) is to test the reasoning ability of an intelligent
agent by evaluating visual and textual information. However, recent studies suggest that …

Simple contrastive learning in a self-supervised manner for robust visual question answering

S Yang, L Xiao, X Wu, J Xu, L Wang, L He - Computer Vision and Image …, 2024 - Elsevier
Recent observations have revealed that Visual Question Answering models are susceptible
to learning the spurious correlations formed by dataset biases, ie, the language priors …

Margin and Shared Proxies: Advanced Proxy Anchor Loss for Out-of-Domain Intent Classification

J Park, B Kim, S Han, S Ji, J Rhee - Applied Sciences, 2024 - mdpi.com
Out-of-Domain (OOD) intent classification is an important task for a dialog system, as it
allows for appropriate responses to be generated. Previous studies aiming to solve the OOD …

Interpretable visual question answering via reasoning supervision

M Parelli, D Mallis, M Diomataris… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
Transformer-based architectures have recently demonstrated remarkable performance in
the Visual Question Answering (VQA) task. However, such models are likely to disregard …

Compressing and Debiasing Vision-Language Pre-Trained Models for Visual Question Answering

Q Si, Y Liu, Z Lin, P Fu, W Wang - arXiv preprint arXiv:2210.14558, 2022 - arxiv.org
Despite the excellent performance of vision-language pre-trained models (VLPs) on
conventional VQA task, they still suffer from two problems: First, VLPs tend to rely on …

Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering

J Ma, M Hu, P Wang, W Sun, L Song, H Pei… - arXiv preprint arXiv …, 2024 - arxiv.org
Audio-Visual Question Answering (AVQA) is a complex multi-modal reasoning task,
demanding intelligent systems to accurately respond to natural language queries based on …