Deep Multimodal Data Fusion

F Zhao, C Zhang, B Geng - ACM Computing Surveys, 2024 - dl.acm.org
Multimodal Artificial Intelligence (Multimodal AI), in general, involves various types of data
(eg, images, texts, or data collected from different sensors), feature engineering (eg …

Contrastive region guidance: Improving grounding in vision-language models without training

D Wan, J Cho, E Stengel-Eskin, M Bansal - European Conference on …, 2025 - Springer
Highlighting particularly relevant regions of an image can improve the performance of vision-
language models (VLMs) on various vision-language (VL) tasks by guiding the model to …

[HTML][HTML] A comprehensive survey on answer generation methods using NLP

P Upadhyay, R Agarwal, S Dhiman, A Sarkar… - Natural Language …, 2024 - Elsevier
Recent advancements in question-answering systems have significantly enhanced the
capability of computers to understand and respond to queries in natural language. This …

Question-conditioned debiasing with focal visual context fusion for visual question answering

J Liu, GX Wang, CF Fan, F Zhou, HJ Xu - Knowledge-Based Systems, 2023 - Elsevier
Abstract Existing Visual Question Answering models suffer from the language prior, where
the answers provided by the models overly rely on the correlations between questions and …

Signing outside the studio: Benchmarking background robustness for continuous sign language recognition

Y Jang, Y Oh, JW Cho, DJ Kim, JS Chung… - arXiv preprint arXiv …, 2022 - arxiv.org
The goal of this work is background-robust continuous sign language recognition. Most
existing Continuous Sign Language Recognition (CSLR) benchmarks have fixed …

Enhancing robust VQA via contrastive and self-supervised learning

R Cao, Z Li, Z Tang, C Zhang, H Ma - Pattern Recognition, 2025 - Elsevier
Abstract Visual Question Answering (VQA) aims to evaluate the reasoning abilities of an
intelligent agent using visual and textual information. However, recent research indicates …

Robust visual question answering via polarity enhancement and contrast

D Peng, Z Li - Neural Networks, 2024 - Elsevier
Abstract The Visual Question Answering (VQA) task is an important research direction in the
field of artificial intelligence, which requires a model that can simultaneously understand …

Robust Visual Question Answering utilizing Bias Instances and Label Imbalance

L Zhao, K Li, J Qi, Y Sun, Z Zhu - Knowledge-Based Systems, 2024 - Elsevier
Abstract Visual Question Answering (VQA) models often suffer from bias issues which cause
their predictions to rely on superficial correlations in datasets rather than the intrinsic …

Towards Deconfounded Visual Question Answering via Dual-causal Intervention

D Peng, W Wei - Proceedings of the 33rd ACM International Conference …, 2024 - dl.acm.org
The Visual Question Answering (VQA) task has recently become notorious because models
are prone to predicting well-educated" guesses" as answers rather than deriving them …

Counterfactual Mix-up for visual question answering

JW Cho, DJ Kim, Y Jung, IS Kweon - IEEE Access, 2023 - ieeexplore.ieee.org
Counterfactuals have been shown to be a powerful method in Visual Question Answering in
the alleviation of Visual Question Answering's unimodal bias. However, existing …