Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks

J Lu, D Batra, D Parikh, S Lee - Advances in neural …, 2019 - proceedings.neurips.cc
We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-
agnostic joint representations of image content and natural language. We extend the …

Occupancy anticipation for efficient exploration and navigation

SK Ramakrishnan, Z Al-Halah, K Grauman - Computer Vision–ECCV 2020 …, 2020 - Springer
State-of-the-art navigation methods leverage a spatial memory to generalize to new
environments, but their occupancy maps are limited to capturing the geometric structures …

Learning to reconstruct shapes from unseen classes

X Zhang, Z Zhang, C Zhang… - Advances in neural …, 2018 - proceedings.neurips.cc
From a single image, humans are able to perceive the full 3D shape of an object by
exploiting learned shape priors from everyday life. Contemporary single-image 3D …

VisualEchoes: Spatial Image Representation Learning Through Echolocation

R Gao, C Chen, Z Al-Halah, C Schissler… - Computer Vision–ECCV …, 2020 - Springer
Several animal species (eg, bats, dolphins, and whales) and even visually impaired humans
have the remarkable ability to perform echolocation: a biological sonar used to perceive …

Multi-view action recognition using cross-view video prediction

S Vyas, YS Rawat, M Shah - Computer Vision–ECCV 2020: 16th European …, 2020 - Springer
In this work, we address the problem of action recognition in a multi-view environment. Most
of the existing approaches utilize pose information for multi-view action recognition. We …

Learning to look around: Intelligently exploring unseen environments for unknown tasks

D Jayaraman, K Grauman - Proceedings of the IEEE …, 2018 - openaccess.thecvf.com
It is common to implicitly assume access to intelligently captured inputs (eg, photos from a
human photographer), yet autonomously capturing good observations is itself a major …

Visual content-based web page categorization with deep transfer learning and metric learning

D López-Sánchez, AG Arrieta, JM Corchado - Neurocomputing, 2019 - Elsevier
The growing amounts of online multimedia content challenge the current search,
recommendation and information retrieval systems. Information in the form of visual …

Embodied AI‐Driven Operation of Smart Cities: A Concise Review

F Shenavarmasouleh, FG Mohammadi… - Cyberphysical Smart …, 2022 - Wiley Online Library
An undeniable part of a smart city is its use of smart agents. These agents can vary a lot in
sizes, shapes, and functionalities. Embodied artificial intelligence is the field of study that …

Extreme relative pose estimation for rgb-d scans via scene completion

Z Yang, JZ Pan, L Luo, X Zhou… - Proceedings of the …, 2019 - openaccess.thecvf.com
Estimating the relative rigid pose between two RGB-D scans of the same underlying
environment is a fundamental problem in computer vision, robotics, and computer graphics …

End-to-end policy learning for active visual categorization

D Jayaraman, K Grauman - IEEE transactions on pattern …, 2018 - ieeexplore.ieee.org
Visual recognition systems mounted on autonomous moving agents face the challenge of
unconstrained data, but simultaneously have the opportunity to improve their performance …