From show to tell: A survey on deep learning-based image captioning

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

A comprehensive survey of deep learning for image captioning

MDZ Hossain, F Sohel, MF Shiratuddin… - ACM Computing Surveys …, 2019 - dl.acm.org
Generating a description of an image is called image captioning. Image captioning requires
recognizing the important objects, their attributes, and their relationships in an image. It also …

Generalizing face forgery detection with high-frequency features

Y Luo, Y Zhang, J Yan, W Liu - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Current face forgery detection methods achieve high accuracy under the within-database
scenario where training and testing forgeries are synthesized by the same algorithm …

Meshed-memory transformer for image captioning

M Cornia, M Stefanini, L Baraldi… - Proceedings of the …, 2020 - openaccess.thecvf.com
Transformer-based architectures represent the state of the art in sequence modeling tasks
like machine translation and language understanding. Their applicability to multi-modal …

Cross attention network for few-shot classification

R Hou, H Chang, B Ma, S Shan… - Advances in neural …, 2019 - proceedings.neurips.cc
Few-shot classification aims to recognize unlabeled samples from unseen classes given
only few labeled samples. The unseen classes and low-data problem make few-shot …

Classification of remote sensing images using EfficientNet-B3 CNN model with attention

H Alhichri, AS Alswayed, Y Bazi, N Ammour… - IEEE …, 2021 - ieeexplore.ieee.org
Scene classification is a highly useful task in Remote Sensing (RS) applications. Many
efforts have been made to improve the accuracy of RS scene classification. Scene …

Attention, please! A survey of neural attention models in deep learning

A de Santana Correia, EL Colombini - Artificial Intelligence Review, 2022 - Springer
In humans, Attention is a core property of all perceptual and cognitive operations. Given our
limited ability to process competing sources, attention mechanisms select, modulate, and …

Multi-scale self-guided attention for medical image segmentation

A Sinha, J Dolz - IEEE journal of biomedical and health …, 2020 - ieeexplore.ieee.org
Even though convolutional neural networks (CNNs) are driving progress in medical image
segmentation, standard models still have some drawbacks. First, the use of multi-scale …

Entangled transformer for image captioning

G Li, L Zhu, P Liu, Y Yang - Proceedings of the IEEE/CVF …, 2019 - openaccess.thecvf.com
In image captioning, the typical attention mechanisms are arduous to identify the equivalent
visual signals especially when predicting highly abstract words. This phenomenon is known …

Bottom-up and top-down attention for image captioning and visual question answering

P Anderson, X He, C Buehler… - Proceedings of the …, 2018 - openaccess.thecvf.com
Top-down visual attention mechanisms have been used extensively in image captioning
and visual question answering (VQA) to enable deeper image understanding through fine …