Dynamic capsule attention for visual question answering

Y Zhou, R Ji, J Su, X Sun, W Chen - … of the AAAI conference on artificial …, 2019 - ojs.aaai.org
In visual question answering (VQA), recent advances have well advocated the use of
attention mechanism to precisely link the question to the potential answer areas. As the …

Dual recurrent attention units for visual question answering

A Osman, W Samek - arXiv preprint arXiv:1802.00209, 2018 - arxiv.org
Visual Question Answering (VQA) requires AI models to comprehend data in two domains,
vision and text. Current state-of-the-art models use learned attention mechanisms to extract …

Question Modifiers in Visual Question Answering

W Britton, S Sarkhel, D Venugopal - Language Resources and …, 2022 - par.nsf.gov
Abstract Visual Question Answering (VQA) is a challenge problem that can advance AI by
integrating several important sub-disciplines including natural language understanding and …

Improving selective visual question answering by learning from your peers

C Dancette, S Whitehead… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Despite advances in Visual Question Answering (VQA), the ability of models to
assess their own correctness remains underexplored. Recent work has shown that VQA …

MAFA-Net: Multimodal Attribute Feature Attention Network for visual question answering

M Tang, C Ran, L Zong, J Hu, L Li - 2023 - researchsquare.com
Abstract Visual Question Answering (VQA) is a hot topic task to answer natural language
questions related to the content of visual images. In most VQA models, visual appearance …

DRAU: dual recurrent attention units for visual question answering

A Osman, W Samek - Computer Vision and Image Understanding, 2019 - Elsevier
Abstract Visual Question Answering (VQA) requires AI models to comprehend data in two
domains, vision and text. Current state-of-the-art models use learned attention mechanisms …

Check for updates Overcoming Language Priors with Counterfactual Inference for Visual Question Answering

Z Ren, H Wang, M Zhu, Y Wang, T Xiao… - … Linguistics: 22nd China …, 2023 - books.google.com
Recent years have seen a lot of efforts in attacking the issue of language priors in the field of
Visual Question Answering (VQA). Among the extensive efforts, causal inference is regarded …

Multi-modal explicit sparse attention networks for visual question answering

Z Guo, D Han - Sensors, 2020 - mdpi.com
Visual question answering (VQA) is a multi-modal task involving natural language
processing (NLP) and computer vision (CV), which requires models to understand of both …

Visual question answering as reading comprehension

H Li, P Wang, C Shen, A Hengel - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
Visual question answering (VQA) demands simultaneous comprehension of both the image
visual content and natural language questions. In some cases, the reasoning needs the help …

Recovering generalization via pre-training-like knowledge distillation for out-of-distribution visual question answering

Y Song, X Yang, Y Wang, C Xu - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
With the emergence of large-scale multi-modal foundation models, significant improvements
have been made towards Visual Question Answering (VQA) in recent years via the “Pre …