Core challenges in embodied vision-language planning

J Francis, N Kitamura, F Labelle, X Lu, I Navarro… - Journal of Artificial …, 2022 - jair.org
Recent advances in the areas of multimodal machine learning and artificial intelligence (AI)
have led to the development of challenging tasks at the intersection of Computer Vision …

Habitat-web: Learning embodied object-search strategies from human demonstrations at scale

R Ramrakhya, E Undersander… - Proceedings of the …, 2022 - openaccess.thecvf.com
We present a large-scale study of imitating human demonstrations on tasks that require a
virtual robot to search for objects in new environments-(1) ObjectGoal Navigation (eg'find & …

Soundspaces: Audio-visual navigation in 3d environments

C Chen, U Jain, C Schissler, SVA Gari… - Computer Vision–ECCV …, 2020 - Springer
Moving around in the world is naturally a multisensory experience, but today's embodied
agents are deaf—restricted to solely their visual perception of the environment. We introduce …

Auxiliary tasks and exploration enable objectgoal navigation

J Ye, D Batra, A Das, E Wijmans - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Abstract ObjectGoal Navigation (ObjectNav) is an embodied task wherein agents are to
navigate to an object instance in an unseen environment. Prior works have shown that end …

Excalibur: Encouraging and evaluating embodied exploration

H Zhu, R Kapoor, SY Min, W Han, J Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
Experience precedes understanding. Humans constantly explore and learn about their
environment out of curiosity, gather information, and update their models of the world. On the …

A cordial sync: Going beyond marginal policies for multi-agent embodied tasks

U Jain, L Weihs, E Kolve, A Farhadi, S Lazebnik… - Computer Vision–ECCV …, 2020 - Springer
Autonomous agents must learn to collaborate. It is not scalable to develop a new centralized
agent every time a task's difficulty outpaces a single agent's abilities. While multi-agent …

Visual language navigation: A survey and open challenges

SM Park, YG Kim - Artificial Intelligence Review, 2023 - Springer
With the recent development of deep learning, AI models are widely used in various
domains. AI models show good performance for definite tasks such as image classification …

Perceiving the world: Question-guided reinforcement learning for text-based games

Y Xu, M Fang, L Chen, Y Du, JT Zhou… - arXiv preprint arXiv …, 2022 - arxiv.org
Text-based games provide an interactive way to study natural language processing. While
deep reinforcement learning has shown effectiveness in developing the game playing …

Light-weight probing of unsupervised representations for reinforcement learning

W Zhang, A GX-Chen, V Sobal, Y LeCun… - arXiv preprint arXiv …, 2022 - arxiv.org
Unsupervised visual representation learning offers the opportunity to leverage large corpora
of unlabeled trajectories to form useful visual representations, which can benefit the training …

Ical: Continual learning of multimodal agents by transforming trajectories into actionable insights

G Sarch, L Jang, MJ Tarr, WW Cohen, K Marino… - arXiv preprint arXiv …, 2024 - arxiv.org
Large-scale generative language and vision-language models (LLMs and VLMs) excel in
few-shot in-context learning for decision making and instruction following. However, they …