[HTML][HTML] Robust visual question answering via semantic cross modal augmentation
Recent advances in vision-language models have resulted in improved accuracy in visual
question answering (VQA) tasks. However, their robustness remains limited when faced with …
question answering (VQA) tasks. However, their robustness remains limited when faced with …
Deep Fuzzy Multi-Teacher Distillation Network for Medical Visual Question Answering
Medical visual question answering (Medical VQA) is a critical cross-modal interaction task
that garnered considerable attention in the medical domain. Several existing methods …
that garnered considerable attention in the medical domain. Several existing methods …
Improving Data Augmentation for Robust Visual Question Answering with Effective Curriculum Learning
Being widely used in learning unbiased visual question answering (VQA) models, Data
Augmentation (DA) helps mitigate language biases by generating extra training samples …
Augmentation (DA) helps mitigate language biases by generating extra training samples …
Detecting Any instruction-to-answer interaction relationship: Universal Instruction-to-Answer Navigator for Med-VQA
Medical Visual Question Answering (Med-VQA) interprets complex medical imagery using
user instructions for precise diagnostics, yet faces challenges due to diverse, inadequately …
user instructions for precise diagnostics, yet faces challenges due to diverse, inadequately …
[PDF][PDF] More from Less: Learning with Limited Annotated Data in Vision and Language
P Cascante-Bonilla - 2024 - repository.rice.edu
This dissertation stems from two primary concerns in the field of Computer Vision and
Natural Language Processing: to which extent we can learn from less annotated data and …
Natural Language Processing: to which extent we can learn from less annotated data and …