A comprehensive survey of deep learning for image captioning

MDZ Hossain, F Sohel, MF Shiratuddin… - ACM Computing Surveys …, 2019 - dl.acm.org
Generating a description of an image is called image captioning. Image captioning requires
recognizing the important objects, their attributes, and their relationships in an image. It also …

[HTML][HTML] Human gaze assisted artificial intelligence: A review

R Zhang, A Saran, B Liu, Y Zhu, S Guo… - IJCAI: Proceedings of …, 2020 - ncbi.nlm.nih.gov
Human gaze reveals a wealth of information about internal cognitive state. Thus, gaze-
related research has significantly increased in computer vision, natural language …

Reco: Retrieve and co-segment for zero-shot transfer

G Shin, W Xie, S Albanie - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Semantic segmentation has a broad range of applications, but its real-world impact has
been significantly limited by the prohibitive annotation costs necessary to enable …

Latent embeddings for zero-shot classification

Y Xian, Z Akata, G Sharma, Q Nguyen… - Proceedings of the …, 2016 - openaccess.thecvf.com
We present a novel latent embedding model for learning a compatibility function between
image and class embeddings, in the context of zero-shot classification. The proposed …

Class attention network for image recognition

G Cheng, P Lai, D Gao, J Han - Science China Information Sciences, 2023 - Springer
Visual attention has become a popular and widely used component for image recognition.
Although various attention-based methods have been proposed and achieved relatively …

What's the point: Semantic segmentation with point supervision

A Bearman, O Russakovsky, V Ferrari… - European conference on …, 2016 - Springer
The semantic image segmentation task presents a trade-off between test time accuracy and
training time annotation cost. Detailed per-pixel annotations enable training accurate …

Evaluating weakly supervised object localization methods right

J Choe, SJ Oh, S Lee, S Chun… - Proceedings of the …, 2020 - openaccess.thecvf.com
Weakly-supervised object localization (WSOL) has gained popularity over the last years for
its promise to train localization models with only image-level labels. Since the seminal …

Salicon: Saliency in context

M Jiang, S Huang, J Duan, Q Zhao - Proceedings of the IEEE …, 2015 - cv-foundation.org
Saliency in Context (SALICON) is an ongoing effort that aims at understanding and
predicting visual attention. This paper presents a new method to collect large-scale human …

Visual attention consistency under image transforms for multi-label image classification

H Guo, K Zheng, X Fan, H Yu… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
Human visual perception shows good consistency for many multi-label image classification
tasks under certain spatial transforms, such as scaling, rotation, flipping and translation. This …

Large-scale interactive object segmentation with human annotators

R Benenson, S Popov, V Ferrari - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
Manually annotating object segmentation masks is very time consuming. Interactive object
segmentation methods offer a more efficient alternative where a human annotator and a …