The multi-modal fusion in visual question answering: a review of attention mechanisms
Abstract Visual Question Answering (VQA) is a significant cross-disciplinary issue in the
fields of computer vision and natural language processing that requires a computer to output …
fields of computer vision and natural language processing that requires a computer to output …
Graph neural networks: foundation, frontiers and applications
The field of graph neural networks (GNNs) has seen rapid and incredible strides over the
recent years. Graph neural networks, also known as deep learning on graphs, graph …
recent years. Graph neural networks, also known as deep learning on graphs, graph …
Seeing out of the box: End-to-end pre-training for vision-language representation learning
We study on joint learning of Convolutional Neural Network (CNN) and Transformer for
vision-language pre-training (VLPT) which aims to learn cross-modal alignments from …
vision-language pre-training (VLPT) which aims to learn cross-modal alignments from …
Towards zero-shot learning: A brief review and an attention-based embedding network
Zero-shot learning (ZSL), an emerging topic in recent years, targets at distinguishing unseen
class images by taking images from seen classes for training the classifier. Existing works …
class images by taking images from seen classes for training the classifier. Existing works …
Attention on attention for image captioning
Attention mechanisms are widely used in current encoder/decoder frameworks of image
captioning, where a weighted average on encoded vectors is generated at each time step to …
captioning, where a weighted average on encoded vectors is generated at each time step to …
Cross attention network for few-shot classification
Few-shot classification aims to recognize unlabeled samples from unseen classes given
only few labeled samples. The unseen classes and low-data problem make few-shot …
only few labeled samples. The unseen classes and low-data problem make few-shot …
Attention, please! A survey of neural attention models in deep learning
A de Santana Correia, EL Colombini - Artificial Intelligence Review, 2022 - Springer
In humans, Attention is a core property of all perceptual and cognitive operations. Given our
limited ability to process competing sources, attention mechanisms select, modulate, and …
limited ability to process competing sources, attention mechanisms select, modulate, and …
Transvpr: Transformer-based place recognition with multi-level attention aggregation
Visual place recognition is a challenging task for applications such as autonomous driving
navigation and mobile robot localization. Distracting elements presenting in complex scenes …
navigation and mobile robot localization. Distracting elements presenting in complex scenes …
Camp: Cross-modal adaptive message passing for text-image retrieval
Text-image cross-modal retrieval is a challenging task in the field of language and vision.
Most previous approaches independently embed images and sentences into a joint …
Most previous approaches independently embed images and sentences into a joint …
Relation-aware graph attention network for visual question answering
In order to answer semantically-complicated questions about an image, a Visual Question
Answering (VQA) model needs to fully understand the visual scene in the image, especially …
Answering (VQA) model needs to fully understand the visual scene in the image, especially …