- 学术资源搜索

Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks

J Lu, D Batra, D Parikh, S Lee - Advances in neural …, 2019 - proceedings.neurips.cc

We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-
agnostic joint representations of image content and natural language. We extend the …

被引用次数：4047 相关文章所有 8 个版本

[PDF] arxiv.org

Occupancy anticipation for efficient exploration and navigation

SK Ramakrishnan, Z Al-Halah, K Grauman - Computer Vision–ECCV 2020 …, 2020 - Springer

State-of-the-art navigation methods leverage a spatial memory to generalize to new
environments, but their occupancy maps are limited to capturing the geometric structures …

被引用次数：172 相关文章所有 11 个版本

[PDF] neurips.cc

Learning to reconstruct shapes from unseen classes

X Zhang, Z Zhang, C Zhang… - Advances in neural …, 2018 - proceedings.neurips.cc

From a single image, humans are able to perceive the full 3D shape of an object by
exploiting learned shape priors from everyday life. Contemporary single-image 3D …

被引用次数：176 相关文章所有 7 个版本

[PDF] arxiv.org

VisualEchoes: Spatial Image Representation Learning Through Echolocation

R Gao, C Chen, Z Al-Halah, C Schissler… - Computer Vision–ECCV …, 2020 - Springer

Several animal species (eg, bats, dolphins, and whales) and even visually impaired humans
have the remarkable ability to perform echolocation: a biological sonar used to perceive …

被引用次数：103 相关文章所有 11 个版本

[PDF] ecva.net

Multi-view action recognition using cross-view video prediction

S Vyas, YS Rawat, M Shah - Computer Vision–ECCV 2020: 16th European …, 2020 - Springer

In this work, we address the problem of action recognition in a multi-view environment. Most
of the existing approaches utilize pose information for multi-view action recognition. We …

被引用次数：65 相关文章所有 11 个版本

[PDF] thecvf.com

Learning to look around: Intelligently exploring unseen environments for unknown tasks

D Jayaraman, K Grauman - Proceedings of the IEEE …, 2018 - openaccess.thecvf.com

It is common to implicitly assume access to intelligently captured inputs (eg, photos from a
human photographer), yet autonomously capturing good observations is itself a major …

被引用次数：123 相关文章所有 12 个版本

[PDF] usal.es

Visual content-based web page categorization with deep transfer learning and metric learning

D López-Sánchez, AG Arrieta, JM Corchado - Neurocomputing, 2019 - Elsevier

The growing amounts of online multimedia content challenge the current search,
recommendation and information retrieval systems. Information in the form of visual …

被引用次数：71 相关文章所有 3 个版本

[PDF] arxiv.org

Embodied AI‐Driven Operation of Smart Cities: A Concise Review

F Shenavarmasouleh, FG Mohammadi… - Cyberphysical Smart …, 2022 - Wiley Online Library

An undeniable part of a smart city is its use of smart agents. These agents can vary a lot in
sizes, shapes, and functionalities. Embodied artificial intelligence is the field of study that …

被引用次数：11 相关文章所有 10 个版本

[PDF] thecvf.com

Extreme relative pose estimation for rgb-d scans via scene completion

Z Yang, JZ Pan, L Luo, X Zhou… - Proceedings of the …, 2019 - openaccess.thecvf.com

Estimating the relative rigid pose between two RGB-D scans of the same underlying
environment is a fundamental problem in computer vision, robotics, and computer graphics …

被引用次数：53 相关文章所有 7 个版本

[PDF] utexas.edu

End-to-end policy learning for active visual categorization

D Jayaraman, K Grauman - IEEE transactions on pattern …, 2018 - ieeexplore.ieee.org

Visual recognition systems mounted on autonomous moving agents face the challenge of
unconstrained data, but simultaneously have the opportunity to improve their performance …

被引用次数：49 相关文章所有 8 个版本