Explicit ensemble attention learning for improving visual question answering
Abstract Visual Question Answering (VQA) is among the most difficult multi-modal problems
as it requires a machine to be able to properly understand a question about a reference …
as it requires a machine to be able to properly understand a question about a reference …
Learning content and context with language bias for visual question answering
C Yang, S Feng, D Li, H Shen… - … on Multimedia and …, 2021 - ieeexplore.ieee.org
Visual Question Answering (VQA) is a challenging multi-modal task to answer questions
about an image. Many works concentrate on how to reduce language bias which makes …
about an image. Many works concentrate on how to reduce language bias which makes …
Good, better, best: Textual distractors generation for multiple-choice visual question answering via reinforcement learning
Multiple-choice VQA has drawn increasing attention from researchers and end-users
recently. As the demand for automatically constructing large-scale multiple-choice VQA data …
recently. As the demand for automatically constructing large-scale multiple-choice VQA data …
Context relation fusion model for visual question answering
H Zhang, W Wu - 2022 IEEE International Conference on Image …, 2022 - ieeexplore.ieee.org
Traditional VQA models tend to rely on language priors as a shortcut to answer questions
and neglect visual information. To solve this problem, the latest approaches divide language …
and neglect visual information. To solve this problem, the latest approaches divide language …
Learning rich image region representation for visual question answering
We propose to boost VQA by leveraging more powerful feature extractors by improving the
representation ability of both visual and text features and the ensemble of models. For visual …
representation ability of both visual and text features and the ensemble of models. For visual …
Answer again: Improving VQA with cascaded-answering model
Visual Question Answering (VQA) is a very challenging task, which requires to understand
visual images and natural language questions simultaneously. In the open-ended VQA task …
visual images and natural language questions simultaneously. In the open-ended VQA task …
Enhanced Visual Question Answering: A Comparative Analysis and Textual Feature Extraction Via Convolutions
Z Zhang - arXiv preprint arXiv:2405.00479, 2024 - arxiv.org
Visual Question Answering (VQA) has emerged as a highly engaging field in recent years,
attracting increasing research efforts aiming to enhance VQA accuracy through the …
attracting increasing research efforts aiming to enhance VQA accuracy through the …
Semantic multi-modal reprojection for robust visual question answering
Despite recent progress in the development of vision-language models in accurate visual
question answering (VQA), the robustness of these models is still quite limited in the …
question answering (VQA), the robustness of these models is still quite limited in the …
Jointly learning attentions with semantic cross-modal correlation for visual question answering
Abstract Visual Question Answering (VQA) has emerged as a prominent multi-discipline
research problem in artificial intelligence. A number of recent studies are focusing on …
research problem in artificial intelligence. A number of recent studies are focusing on …
Multitask learning for visual question answering
Visual question answering (VQA) is a task that machines should provide an accurate natural
language answer given an image and a question about the image. Many studies have found …
language answer given an image and a question about the image. Many studies have found …