Multimodal research in vision and language: A review of current and emerging trends

S Uppal, S Bhagat, D Hazarika, N Majumder, S Poria… - Information …, 2022 - Elsevier
Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …

A survey of data augmentation approaches for NLP

SY Feng, V Gangal, J Wei, S Chandar… - arXiv preprint arXiv …, 2021 - arxiv.org
Data augmentation has recently seen increased interest in NLP due to more work in low-
resource domains, new tasks, and the popularity of large-scale neural networks that require …

Teaching structured vision & language concepts to vision & language models

S Doveh, A Arbelle, S Harary… - Proceedings of the …, 2023 - openaccess.thecvf.com
Vision and Language (VL) models have demonstrated remarkable zero-shot performance in
a variety of tasks. However, some aspects of complex language understanding still remain a …

Mixgen: A new multi-modal data augmentation

X Hao, Y Zhu, S Appalaraju, A Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Data augmentation is a necessity to enhance data efficiency in deep learning. For vision-
language pre-training, data is only augmented either for images or for text in previous works …

Adversarial attack and defense technologies in natural language processing: A survey

S Qiu, Q Liu, S Zhou, W Huang - Neurocomputing, 2022 - Elsevier
Recently, the adversarial attack and defense technology has made remarkable
achievements and has been widely applied in the computer vision field, promoting its rapid …

Rethinking data augmentation for robust visual question answering

L Chen, Y Zheng, J Xiao - European conference on computer vision, 2022 - Springer
Data Augmentation (DA)—generating extra training samples beyond the original training set—
has been widely-used in today's unbiased VQA models to mitigate language biases. Current …

Simvqa: Exploring simulated environments for visual question answering

P Cascante-Bonilla, H Wu, L Wang… - Proceedings of the …, 2022 - openaccess.thecvf.com
Existing work on VQA explores data augmentation to achieve better generalization by
perturbing the images in the dataset or modifying the existing questions and answers. While …

Unshuffling data for improved generalization in visual question answering

D Teney, E Abbasnejad… - Proceedings of the …, 2021 - openaccess.thecvf.com
Generalization beyond the training distribution is a core challenge in machine learning. The
common practice of mixing and shuffling examples when training neural networks may not …

Vqamix: Conditional triplet mixup for medical visual question answering

H Gong, G Chen, M Mao, Z Li… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Medical visual question answering (VQA) aims to correctly answer a clinical question related
to a given medical image. Nevertheless, owing to the expensive manual annotations of …

A survey on VQA: Datasets and approaches

Y Zou, Q Xie - 2020 2nd International Conference on …, 2020 - ieeexplore.ieee.org
Visual question answering (VQA) is a task that combines both the techniques of computer
vision and natural language processing. It requires models to answer a text-based question …