Digging out discrimination information from generated samples for robust visual question answering
Abstract Visual Question Answering (VQA) aims to answer a textual question based on a
given image. Nevertheless, recent studies have shown that VQA models tend to capture the …
given image. Nevertheless, recent studies have shown that VQA models tend to capture the …
Causal Reasoning through Two Cognition Layers for Improving Generalization in Visual Question Answering
T Nguyen, N Okazaki - Proceedings of the 2023 Conference on …, 2023 - aclanthology.org
Abstract Generalization in Visual Question Answering (VQA) requires models to answer
questions about images with contexts beyond the training distribution. Existing attempts …
questions about images with contexts beyond the training distribution. Existing attempts …
Robust visual question answering: Datasets, methods, and future challenges
J Ma, P Wang, D Kong, Z Wang, J Liu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Visual question answering requires a system to provide an accurate natural language
answer given an image and a natural language question. However, it is widely recognized …
answer given an image and a natural language question. However, it is widely recognized …
Object Attribute Matters in Visual Question Answering
Visual question answering is a multimodal task that requires the joint comprehension of
visual and textual information. However, integrating visual and textual semantics solely …
visual and textual information. However, integrating visual and textual semantics solely …
Balancing and contrasting biased samples for debiased visual question answering
R Cao, Z Li - 2023 IEEE international conference on data …, 2023 - ieeexplore.ieee.org
The goal of Visual Question Answering (VQA) is to test the reasoning ability of an intelligent
agent by evaluating visual and textual information. However, recent studies suggest that …
agent by evaluating visual and textual information. However, recent studies suggest that …
Simple contrastive learning in a self-supervised manner for robust visual question answering
Recent observations have revealed that Visual Question Answering models are susceptible
to learning the spurious correlations formed by dataset biases, ie, the language priors …
to learning the spurious correlations formed by dataset biases, ie, the language priors …
Margin and Shared Proxies: Advanced Proxy Anchor Loss for Out-of-Domain Intent Classification
J Park, B Kim, S Han, S Ji, J Rhee - Applied Sciences, 2024 - mdpi.com
Out-of-Domain (OOD) intent classification is an important task for a dialog system, as it
allows for appropriate responses to be generated. Previous studies aiming to solve the OOD …
allows for appropriate responses to be generated. Previous studies aiming to solve the OOD …
Interpretable visual question answering via reasoning supervision
Transformer-based architectures have recently demonstrated remarkable performance in
the Visual Question Answering (VQA) task. However, such models are likely to disregard …
the Visual Question Answering (VQA) task. However, such models are likely to disregard …
Compressing and Debiasing Vision-Language Pre-Trained Models for Visual Question Answering
Despite the excellent performance of vision-language pre-trained models (VLPs) on
conventional VQA task, they still suffer from two problems: First, VLPs tend to rely on …
conventional VQA task, they still suffer from two problems: First, VLPs tend to rely on …
Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering
J Ma, M Hu, P Wang, W Sun, L Song, H Pei… - arXiv preprint arXiv …, 2024 - arxiv.org
Audio-Visual Question Answering (AVQA) is a complex multi-modal reasoning task,
demanding intelligent systems to accurately respond to natural language queries based on …
demanding intelligent systems to accurately respond to natural language queries based on …