From show to tell: A survey on deep learning-based image captioning
Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …
reason, large research efforts have been devoted to image captioning, ie describing images …
A comprehensive survey of deep learning for image captioning
Generating a description of an image is called image captioning. Image captioning requires
recognizing the important objects, their attributes, and their relationships in an image. It also …
recognizing the important objects, their attributes, and their relationships in an image. It also …
Generalizing face forgery detection with high-frequency features
Current face forgery detection methods achieve high accuracy under the within-database
scenario where training and testing forgeries are synthesized by the same algorithm …
scenario where training and testing forgeries are synthesized by the same algorithm …
Meshed-memory transformer for image captioning
Transformer-based architectures represent the state of the art in sequence modeling tasks
like machine translation and language understanding. Their applicability to multi-modal …
like machine translation and language understanding. Their applicability to multi-modal …
Cross attention network for few-shot classification
Few-shot classification aims to recognize unlabeled samples from unseen classes given
only few labeled samples. The unseen classes and low-data problem make few-shot …
only few labeled samples. The unseen classes and low-data problem make few-shot …
Classification of remote sensing images using EfficientNet-B3 CNN model with attention
Scene classification is a highly useful task in Remote Sensing (RS) applications. Many
efforts have been made to improve the accuracy of RS scene classification. Scene …
efforts have been made to improve the accuracy of RS scene classification. Scene …
Attention, please! A survey of neural attention models in deep learning
A de Santana Correia, EL Colombini - Artificial Intelligence Review, 2022 - Springer
In humans, Attention is a core property of all perceptual and cognitive operations. Given our
limited ability to process competing sources, attention mechanisms select, modulate, and …
limited ability to process competing sources, attention mechanisms select, modulate, and …
Multi-scale self-guided attention for medical image segmentation
Even though convolutional neural networks (CNNs) are driving progress in medical image
segmentation, standard models still have some drawbacks. First, the use of multi-scale …
segmentation, standard models still have some drawbacks. First, the use of multi-scale …
Entangled transformer for image captioning
In image captioning, the typical attention mechanisms are arduous to identify the equivalent
visual signals especially when predicting highly abstract words. This phenomenon is known …
visual signals especially when predicting highly abstract words. This phenomenon is known …
Bottom-up and top-down attention for image captioning and visual question answering
P Anderson, X He, C Buehler… - Proceedings of the …, 2018 - openaccess.thecvf.com
Top-down visual attention mechanisms have been used extensively in image captioning
and visual question answering (VQA) to enable deeper image understanding through fine …
and visual question answering (VQA) to enable deeper image understanding through fine …