Dynamic capsule attention for visual question answering
In visual question answering (VQA), recent advances have well advocated the use of
attention mechanism to precisely link the question to the potential answer areas. As the …
attention mechanism to precisely link the question to the potential answer areas. As the …
Dual recurrent attention units for visual question answering
Visual Question Answering (VQA) requires AI models to comprehend data in two domains,
vision and text. Current state-of-the-art models use learned attention mechanisms to extract …
vision and text. Current state-of-the-art models use learned attention mechanisms to extract …
Question Modifiers in Visual Question Answering
W Britton, S Sarkhel, D Venugopal - Language Resources and …, 2022 - par.nsf.gov
Abstract Visual Question Answering (VQA) is a challenge problem that can advance AI by
integrating several important sub-disciplines including natural language understanding and …
integrating several important sub-disciplines including natural language understanding and …
Improving selective visual question answering by learning from your peers
C Dancette, S Whitehead… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Despite advances in Visual Question Answering (VQA), the ability of models to
assess their own correctness remains underexplored. Recent work has shown that VQA …
assess their own correctness remains underexplored. Recent work has shown that VQA …
MAFA-Net: Multimodal Attribute Feature Attention Network for visual question answering
M Tang, C Ran, L Zong, J Hu, L Li - 2023 - researchsquare.com
Abstract Visual Question Answering (VQA) is a hot topic task to answer natural language
questions related to the content of visual images. In most VQA models, visual appearance …
questions related to the content of visual images. In most VQA models, visual appearance …
DRAU: dual recurrent attention units for visual question answering
Abstract Visual Question Answering (VQA) requires AI models to comprehend data in two
domains, vision and text. Current state-of-the-art models use learned attention mechanisms …
domains, vision and text. Current state-of-the-art models use learned attention mechanisms …
Check for updates Overcoming Language Priors with Counterfactual Inference for Visual Question Answering
Z Ren, H Wang, M Zhu, Y Wang, T Xiao… - … Linguistics: 22nd China …, 2023 - books.google.com
Recent years have seen a lot of efforts in attacking the issue of language priors in the field of
Visual Question Answering (VQA). Among the extensive efforts, causal inference is regarded …
Visual Question Answering (VQA). Among the extensive efforts, causal inference is regarded …
Multi-modal explicit sparse attention networks for visual question answering
Z Guo, D Han - Sensors, 2020 - mdpi.com
Visual question answering (VQA) is a multi-modal task involving natural language
processing (NLP) and computer vision (CV), which requires models to understand of both …
processing (NLP) and computer vision (CV), which requires models to understand of both …
Visual question answering as reading comprehension
Visual question answering (VQA) demands simultaneous comprehension of both the image
visual content and natural language questions. In some cases, the reasoning needs the help …
visual content and natural language questions. In some cases, the reasoning needs the help …
Recovering generalization via pre-training-like knowledge distillation for out-of-distribution visual question answering
With the emergence of large-scale multi-modal foundation models, significant improvements
have been made towards Visual Question Answering (VQA) in recent years via the “Pre …
have been made towards Visual Question Answering (VQA) in recent years via the “Pre …