相关文章- 学术资源搜索

Dynamic capsule attention for visual question answering

Y Zhou, R Ji, J Su, X Sun, W Chen - … of the AAAI conference on artificial …, 2019 - ojs.aaai.org

In visual question answering (VQA), recent advances have well advocated the use of
attention mechanism to precisely link the question to the potential answer areas. As the …

被引用次数：45 相关文章所有 5 个版本

[PDF] arxiv.org

Dual recurrent attention units for visual question answering

A Osman, W Samek - arXiv preprint arXiv:1802.00209, 2018 - arxiv.org

Visual Question Answering (VQA) requires AI models to comprehend data in two domains,
vision and text. Current state-of-the-art models use learned attention mechanisms to extract …

被引用次数：9 相关文章所有 2 个版本

[PDF] nsf.gov

Question Modifiers in Visual Question Answering

W Britton, S Sarkhel, D Venugopal - Language Resources and …, 2022 - par.nsf.gov

Abstract Visual Question Answering (VQA) is a challenge problem that can advance AI by
integrating several important sub-disciplines including natural language understanding and …

被引用次数：1 相关文章所有 6 个版本

[PDF] thecvf.com

Improving selective visual question answering by learning from your peers

C Dancette, S Whitehead… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract Despite advances in Visual Question Answering (VQA), the ability of models to
assess their own correctness remains underexplored. Recent work has shown that VQA …

被引用次数：12 相关文章所有 5 个版本

[PDF] researchsquare.com

MAFA-Net: Multimodal Attribute Feature Attention Network for visual question answering

M Tang, C Ran, L Zong, J Hu, L Li - 2023 - researchsquare.com

Abstract Visual Question Answering (VQA) is a hot topic task to answer natural language
questions related to the content of visual images. In most VQA models, visual appearance …

DRAU: dual recurrent attention units for visual question answering

A Osman, W Samek - Computer Vision and Image Understanding, 2019 - Elsevier

Abstract Visual Question Answering (VQA) requires AI models to comprehend data in two
domains, vision and text. Current state-of-the-art models use learned attention mechanisms …

被引用次数：33 相关文章所有 8 个版本

Check for updates Overcoming Language Priors with Counterfactual Inference for Visual Question Answering

Z Ren, H Wang, M Zhu, Y Wang, T Xiao… - … Linguistics: 22nd China …, 2023 - books.google.com

Recent years have seen a lot of efforts in attacking the issue of language priors in the field of
Visual Question Answering (VQA). Among the extensive efforts, causal inference is regarded …

[PDF] mdpi.com

Multi-modal explicit sparse attention networks for visual question answering

Z Guo, D Han - Sensors, 2020 - mdpi.com

Visual question answering (VQA) is a multi-modal task involving natural language
processing (NLP) and computer vision (CV), which requires models to understand of both …

被引用次数：14 相关文章所有 9 个版本

[PDF] thecvf.com

Visual question answering as reading comprehension

H Li, P Wang, C Shen, A Hengel - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

Visual question answering (VQA) demands simultaneous comprehension of both the image
visual content and natural language questions. In some cases, the reasoning needs the help …

被引用次数：48 相关文章所有 8 个版本

Recovering generalization via pre-training-like knowledge distillation for out-of-distribution visual question answering

Y Song, X Yang, Y Wang, C Xu - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

With the emergence of large-scale multi-modal foundation models, significant improvements
have been made towards Visual Question Answering (VQA) in recent years via the “Pre …

被引用次数：7 相关文章所有 2 个版本