[HTML][HTML] Robust visual question answering via semantic cross modal augmentation

A Mashrur, W Luo, NA Zaidi, A Robles-Kelly - Computer Vision and Image …, 2024 - Elsevier
Recent advances in vision-language models have resulted in improved accuracy in visual
question answering (VQA) tasks. However, their robustness remains limited when faced with …

Deep Fuzzy Multi-Teacher Distillation Network for Medical Visual Question Answering

Y Liu, B Chen, S Wang, G Lu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Medical visual question answering (Medical VQA) is a critical cross-modal interaction task
that garnered considerable attention in the medical domain. Several existing methods …

Improving Data Augmentation for Robust Visual Question Answering with Effective Curriculum Learning

Y Zheng, Z Wang, L Chen - … of the 2024 International Conference on …, 2024 - dl.acm.org
Being widely used in learning unbiased visual question answering (VQA) models, Data
Augmentation (DA) helps mitigate language biases by generating extra training samples …

Detecting Any instruction-to-answer interaction relationship: Universal Instruction-to-Answer Navigator for Med-VQA

Z Wu, H Xu, Y Long, S You, X Su, J Long, Y Luo… - Forty-first International … - openreview.net
Medical Visual Question Answering (Med-VQA) interprets complex medical imagery using
user instructions for precise diagnostics, yet faces challenges due to diverse, inadequately …

[PDF][PDF] More from Less: Learning with Limited Annotated Data in Vision and Language

P Cascante-Bonilla - 2024 - repository.rice.edu
This dissertation stems from two primary concerns in the field of Computer Vision and
Natural Language Processing: to which extent we can learn from less annotated data and …