The multi-modal fusion in visual question answering: a review of attention mechanisms
Abstract Visual Question Answering (VQA) is a significant cross-disciplinary issue in the
fields of computer vision and natural language processing that requires a computer to output …
fields of computer vision and natural language processing that requires a computer to output …
Vision-language pre-training: Basics, recent advances, and future trends
This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …
intelligence that have been developed in the last few years. We group these approaches …
Multi-task learning with deep neural networks: A survey
M Crawshaw - arXiv preprint arXiv:2009.09796, 2020 - arxiv.org
Multi-task learning (MTL) is a subfield of machine learning in which multiple tasks are
simultaneously learned by a shared model. Such approaches offer advantages like …
simultaneously learned by a shared model. Such approaches offer advantages like …
Multimodal co-attention transformer for survival prediction in gigapixel whole slide images
Survival outcome prediction is a challenging weakly-supervised and ordinal regression task
in computational pathology that involves modeling complex interactions within the tumor …
in computational pathology that involves modeling complex interactions within the tumor …
Deep modular co-attention networks for visual question answering
Abstract Visual Question Answering (VQA) requires a fine-grained and simultaneous
understanding of both the visual content of images and the textual content of questions …
understanding of both the visual content of images and the textual content of questions …
Changer: Feature interaction is what you need for change detection
S Fang, K Li, Z Li - IEEE Transactions on Geoscience and …, 2023 - ieeexplore.ieee.org
Change detection is an important tool for long-term Earth observation missions. It takes bi-
temporal images as input and predicts “where” the change has occurred. Different from other …
temporal images as input and predicts “where” the change has occurred. Different from other …
[PDF][PDF] Multimodal fusion with co-attention networks for fake news detection
Y Wu, P Zhan, Y Zhang, L Wang… - Findings of the association …, 2021 - aclanthology.org
Fake news with textual and visual contents has a better story-telling ability than text-only
contents, and can be spread quickly with social media. People can be easily deceived by …
contents, and can be spread quickly with social media. People can be easily deceived by …
Comparing recognition performance and robustness of multimodal deep learning models for multimodal emotion recognition
Multimodal signals are powerful for emotion recognition since they can represent emotions
comprehensively. In this article, we compare the recognition performance and robustness of …
comprehensively. In this article, we compare the recognition performance and robustness of …
Attention, please! A survey of neural attention models in deep learning
A de Santana Correia, EL Colombini - Artificial Intelligence Review, 2022 - Springer
In humans, Attention is a core property of all perceptual and cognitive operations. Given our
limited ability to process competing sources, attention mechanisms select, modulate, and …
limited ability to process competing sources, attention mechanisms select, modulate, and …
Mukea: Multimodal knowledge extraction and accumulation for knowledge-based visual question answering
Abstract Knowledge-based visual question answering requires the ability of associating
external knowledge for open-ended cross-modal scene understanding. One limitation of …
external knowledge for open-ended cross-modal scene understanding. One limitation of …