Explicit ensemble attention learning for improving visual question answering

V Lioutas, N Passalis, A Tefas - Pattern Recognition Letters, 2018 - Elsevier
Abstract Visual Question Answering (VQA) is among the most difficult multi-modal problems
as it requires a machine to be able to properly understand a question about a reference …

Learning content and context with language bias for visual question answering

C Yang, S Feng, D Li, H Shen… - … on Multimedia and …, 2021 - ieeexplore.ieee.org
Visual Question Answering (VQA) is a challenging multi-modal task to answer questions
about an image. Many works concentrate on how to reduce language bias which makes …

Good, better, best: Textual distractors generation for multiple-choice visual question answering via reinforcement learning

J Lu, X Ye, Y Ren, Y Yang - … of the IEEE/CVF Conference on …, 2022 - openaccess.thecvf.com
Multiple-choice VQA has drawn increasing attention from researchers and end-users
recently. As the demand for automatically constructing large-scale multiple-choice VQA data …

Context relation fusion model for visual question answering

H Zhang, W Wu - 2022 IEEE International Conference on Image …, 2022 - ieeexplore.ieee.org
Traditional VQA models tend to rely on language priors as a shortcut to answer questions
and neglect visual information. To solve this problem, the latest approaches divide language …

Learning rich image region representation for visual question answering

B Liu, Z Huang, Z Zeng, Z Chen, J Fu - arXiv preprint arXiv:1910.13077, 2019 - arxiv.org
We propose to boost VQA by leveraging more powerful feature extractors by improving the
representation ability of both visual and text features and the ensemble of models. For visual …

Answer again: Improving VQA with cascaded-answering model

L Peng, Y Yang, X Zhang, Y Ji, H Lu… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Visual Question Answering (VQA) is a very challenging task, which requires to understand
visual images and natural language questions simultaneously. In the open-ended VQA task …

Enhanced Visual Question Answering: A Comparative Analysis and Textual Feature Extraction Via Convolutions

Z Zhang - arXiv preprint arXiv:2405.00479, 2024 - arxiv.org
Visual Question Answering (VQA) has emerged as a highly engaging field in recent years,
attracting increasing research efforts aiming to enhance VQA accuracy through the …

Semantic multi-modal reprojection for robust visual question answering

A Mashrur, W Luo, NA Zaidi… - … Conference on Digital …, 2022 - ieeexplore.ieee.org
Despite recent progress in the development of vision-language models in accurate visual
question answering (VQA), the robustness of these models is still quite limited in the …

Jointly learning attentions with semantic cross-modal correlation for visual question answering

L Cao, L Gao, J Song, X Xu, HT Shen - Databases Theory and …, 2017 - Springer
Abstract Visual Question Answering (VQA) has emerged as a prominent multi-discipline
research problem in artificial intelligence. A number of recent studies are focusing on …

Multitask learning for visual question answering

J Ma, J Liu, Q Lin, B Wu, Y Wang… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Visual question answering (VQA) is a task that machines should provide an accurate natural
language answer given an image and a question about the image. Many studies have found …