Probing emergent semantics in predictive agents via question answering

J Francis, N Kitamura, F Labelle, X Lu, I Navarro… - Journal of Artificial …, 2022 - jair.org

Recent advances in the areas of multimodal machine learning and artificial intelligence (AI)
have led to the development of challenging tasks at the intersection of Computer Vision …

被引用次数：42 相关文章所有 14 个版本

[PDF] thecvf.com

Habitat-web: Learning embodied object-search strategies from human demonstrations at scale

R Ramrakhya, E Undersander… - Proceedings of the …, 2022 - openaccess.thecvf.com

We present a large-scale study of imitating human demonstrations on tasks that require a
virtual robot to search for objects in new environments-(1) ObjectGoal Navigation (eg'find & …

被引用次数：88 相关文章所有 6 个版本

[PDF] arxiv.org

Soundspaces: Audio-visual navigation in 3d environments

C Chen, U Jain, C Schissler, SVA Gari… - Computer Vision–ECCV …, 2020 - Springer

Moving around in the world is naturally a multisensory experience, but today's embodied
agents are deaf—restricted to solely their visual perception of the environment. We introduce …

被引用次数：277 相关文章所有 6 个版本

[PDF] thecvf.com

Auxiliary tasks and exploration enable objectgoal navigation

J Ye, D Batra, A Das, E Wijmans - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

Abstract ObjectGoal Navigation (ObjectNav) is an embodied task wherein agents are to
navigate to an object instance in an unseen environment. Prior works have shown that end …

被引用次数：87 相关文章所有 3 个版本

[PDF] thecvf.com

Excalibur: Encouraging and evaluating embodied exploration

H Zhu, R Kapoor, SY Min, W Han, J Li… - Proceedings of the …, 2023 - openaccess.thecvf.com

Experience precedes understanding. Humans constantly explore and learn about their
environment out of curiosity, gather information, and update their models of the world. On the …

被引用次数：12 相关文章所有 4 个版本

[PDF] arxiv.org

A cordial sync: Going beyond marginal policies for multi-agent embodied tasks

U Jain, L Weihs, E Kolve, A Farhadi, S Lazebnik… - Computer Vision–ECCV …, 2020 - Springer

Autonomous agents must learn to collaborate. It is not scalable to develop a new centralized
agent every time a task's difficulty outpaces a single agent's abilities. While multi-agent …

被引用次数：56 相关文章所有 6 个版本

Visual language navigation: A survey and open challenges

SM Park, YG Kim - Artificial Intelligence Review, 2023 - Springer

With the recent development of deep learning, AI models are widely used in various
domains. AI models show good performance for definite tasks such as image classification …

被引用次数：24 相关文章所有 5 个版本

[PDF] arxiv.org

Perceiving the world: Question-guided reinforcement learning for text-based games

Y Xu, M Fang, L Chen, Y Du, JT Zhou… - arXiv preprint arXiv …, 2022 - arxiv.org

Text-based games provide an interactive way to study natural language processing. While
deep reinforcement learning has shown effectiveness in developing the game playing …

被引用次数：16 相关文章所有 7 个版本

[PDF] arxiv.org

Light-weight probing of unsupervised representations for reinforcement learning

W Zhang, A GX-Chen, V Sobal, Y LeCun… - arXiv preprint arXiv …, 2022 - arxiv.org

Unsupervised visual representation learning offers the opportunity to leverage large corpora
of unlabeled trajectories to form useful visual representations, which can benefit the training …

被引用次数：9 相关文章所有 3 个版本

[PDF] arxiv.org

Ical: Continual learning of multimodal agents by transforming trajectories into actionable insights

G Sarch, L Jang, MJ Tarr, WW Cohen, K Marino… - arXiv preprint arXiv …, 2024 - arxiv.org

Large-scale generative language and vision-language models (LLMs and VLMs) excel in
few-shot in-context learning for decision making and instruction following. However, they …

被引用次数：1 相关文章所有 2 个版本