Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks
We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-
agnostic joint representations of image content and natural language. We extend the …
agnostic joint representations of image content and natural language. We extend the …
Occupancy anticipation for efficient exploration and navigation
State-of-the-art navigation methods leverage a spatial memory to generalize to new
environments, but their occupancy maps are limited to capturing the geometric structures …
environments, but their occupancy maps are limited to capturing the geometric structures …
Learning to reconstruct shapes from unseen classes
From a single image, humans are able to perceive the full 3D shape of an object by
exploiting learned shape priors from everyday life. Contemporary single-image 3D …
exploiting learned shape priors from everyday life. Contemporary single-image 3D …
VisualEchoes: Spatial Image Representation Learning Through Echolocation
Several animal species (eg, bats, dolphins, and whales) and even visually impaired humans
have the remarkable ability to perform echolocation: a biological sonar used to perceive …
have the remarkable ability to perform echolocation: a biological sonar used to perceive …
Multi-view action recognition using cross-view video prediction
In this work, we address the problem of action recognition in a multi-view environment. Most
of the existing approaches utilize pose information for multi-view action recognition. We …
of the existing approaches utilize pose information for multi-view action recognition. We …
Learning to look around: Intelligently exploring unseen environments for unknown tasks
D Jayaraman, K Grauman - Proceedings of the IEEE …, 2018 - openaccess.thecvf.com
It is common to implicitly assume access to intelligently captured inputs (eg, photos from a
human photographer), yet autonomously capturing good observations is itself a major …
human photographer), yet autonomously capturing good observations is itself a major …
Visual content-based web page categorization with deep transfer learning and metric learning
D López-Sánchez, AG Arrieta, JM Corchado - Neurocomputing, 2019 - Elsevier
The growing amounts of online multimedia content challenge the current search,
recommendation and information retrieval systems. Information in the form of visual …
recommendation and information retrieval systems. Information in the form of visual …
Embodied AI‐Driven Operation of Smart Cities: A Concise Review
F Shenavarmasouleh, FG Mohammadi… - Cyberphysical Smart …, 2022 - Wiley Online Library
An undeniable part of a smart city is its use of smart agents. These agents can vary a lot in
sizes, shapes, and functionalities. Embodied artificial intelligence is the field of study that …
sizes, shapes, and functionalities. Embodied artificial intelligence is the field of study that …
Extreme relative pose estimation for rgb-d scans via scene completion
Estimating the relative rigid pose between two RGB-D scans of the same underlying
environment is a fundamental problem in computer vision, robotics, and computer graphics …
environment is a fundamental problem in computer vision, robotics, and computer graphics …
End-to-end policy learning for active visual categorization
D Jayaraman, K Grauman - IEEE transactions on pattern …, 2018 - ieeexplore.ieee.org
Visual recognition systems mounted on autonomous moving agents face the challenge of
unconstrained data, but simultaneously have the opportunity to improve their performance …
unconstrained data, but simultaneously have the opportunity to improve their performance …