A comprehensive survey of deep learning for image captioning
Generating a description of an image is called image captioning. Image captioning requires
recognizing the important objects, their attributes, and their relationships in an image. It also …
recognizing the important objects, their attributes, and their relationships in an image. It also …
[HTML][HTML] Human gaze assisted artificial intelligence: A review
Human gaze reveals a wealth of information about internal cognitive state. Thus, gaze-
related research has significantly increased in computer vision, natural language …
related research has significantly increased in computer vision, natural language …
Reco: Retrieve and co-segment for zero-shot transfer
Semantic segmentation has a broad range of applications, but its real-world impact has
been significantly limited by the prohibitive annotation costs necessary to enable …
been significantly limited by the prohibitive annotation costs necessary to enable …
Latent embeddings for zero-shot classification
We present a novel latent embedding model for learning a compatibility function between
image and class embeddings, in the context of zero-shot classification. The proposed …
image and class embeddings, in the context of zero-shot classification. The proposed …
Class attention network for image recognition
Visual attention has become a popular and widely used component for image recognition.
Although various attention-based methods have been proposed and achieved relatively …
Although various attention-based methods have been proposed and achieved relatively …
What's the point: Semantic segmentation with point supervision
The semantic image segmentation task presents a trade-off between test time accuracy and
training time annotation cost. Detailed per-pixel annotations enable training accurate …
training time annotation cost. Detailed per-pixel annotations enable training accurate …
Evaluating weakly supervised object localization methods right
Weakly-supervised object localization (WSOL) has gained popularity over the last years for
its promise to train localization models with only image-level labels. Since the seminal …
its promise to train localization models with only image-level labels. Since the seminal …
Salicon: Saliency in context
Saliency in Context (SALICON) is an ongoing effort that aims at understanding and
predicting visual attention. This paper presents a new method to collect large-scale human …
predicting visual attention. This paper presents a new method to collect large-scale human …
Visual attention consistency under image transforms for multi-label image classification
Human visual perception shows good consistency for many multi-label image classification
tasks under certain spatial transforms, such as scaling, rotation, flipping and translation. This …
tasks under certain spatial transforms, such as scaling, rotation, flipping and translation. This …
Large-scale interactive object segmentation with human annotators
Manually annotating object segmentation masks is very time consuming. Interactive object
segmentation methods offer a more efficient alternative where a human annotator and a …
segmentation methods offer a more efficient alternative where a human annotator and a …