Multimodal research in vision and language: A review of current and emerging trends
Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …
with a diverse range of modalities present in the real-world data. More recently, this has …
A survey of data augmentation approaches for NLP
Data augmentation has recently seen increased interest in NLP due to more work in low-
resource domains, new tasks, and the popularity of large-scale neural networks that require …
resource domains, new tasks, and the popularity of large-scale neural networks that require …
Teaching structured vision & language concepts to vision & language models
Vision and Language (VL) models have demonstrated remarkable zero-shot performance in
a variety of tasks. However, some aspects of complex language understanding still remain a …
a variety of tasks. However, some aspects of complex language understanding still remain a …
Mixgen: A new multi-modal data augmentation
Data augmentation is a necessity to enhance data efficiency in deep learning. For vision-
language pre-training, data is only augmented either for images or for text in previous works …
language pre-training, data is only augmented either for images or for text in previous works …
Adversarial attack and defense technologies in natural language processing: A survey
S Qiu, Q Liu, S Zhou, W Huang - Neurocomputing, 2022 - Elsevier
Recently, the adversarial attack and defense technology has made remarkable
achievements and has been widely applied in the computer vision field, promoting its rapid …
achievements and has been widely applied in the computer vision field, promoting its rapid …
Rethinking data augmentation for robust visual question answering
Data Augmentation (DA)—generating extra training samples beyond the original training set—
has been widely-used in today's unbiased VQA models to mitigate language biases. Current …
has been widely-used in today's unbiased VQA models to mitigate language biases. Current …
Simvqa: Exploring simulated environments for visual question answering
Existing work on VQA explores data augmentation to achieve better generalization by
perturbing the images in the dataset or modifying the existing questions and answers. While …
perturbing the images in the dataset or modifying the existing questions and answers. While …
Unshuffling data for improved generalization in visual question answering
D Teney, E Abbasnejad… - Proceedings of the …, 2021 - openaccess.thecvf.com
Generalization beyond the training distribution is a core challenge in machine learning. The
common practice of mixing and shuffling examples when training neural networks may not …
common practice of mixing and shuffling examples when training neural networks may not …
Vqamix: Conditional triplet mixup for medical visual question answering
Medical visual question answering (VQA) aims to correctly answer a clinical question related
to a given medical image. Nevertheless, owing to the expensive manual annotations of …
to a given medical image. Nevertheless, owing to the expensive manual annotations of …
A survey on VQA: Datasets and approaches
Y Zou, Q Xie - 2020 2nd International Conference on …, 2020 - ieeexplore.ieee.org
Visual question answering (VQA) is a task that combines both the techniques of computer
vision and natural language processing. It requires models to answer a text-based question …
vision and natural language processing. It requires models to answer a text-based question …