Retrospectives on the embodied ai workshop

M Deitke, D Batra, Y Bisk, T Campari, AX Chang… - arXiv preprint arXiv …, 2022 - arxiv.org
We present a retrospective on the state of Embodied AI research. Our analysis focuses on
13 challenges presented at the Embodied AI Workshop at CVPR. These challenges are …

Hierarchical cross-modal agent for robotics vision-and-language navigation

MZ Irshad, CY Ma, Z Kira - 2021 IEEE International Conference …, 2021 - ieeexplore.ieee.org
Deep Learning has revolutionized our ability to solve complex problems such as Vision-and-
Language Navigation (VLN). This task requires the agent to navigate to a goal purely based …

Interactive learning from activity description

KX Nguyen, D Misra, R Schapire… - International …, 2021 - proceedings.mlr.press
We present a novel interactive learning protocol that enables training request-fulfilling
agents by verbally describing their activities. Unlike imitation learning (IL), our protocol …

Semantically-aware spatio-temporal reasoning agent for vision-and-language navigation in continuous environments

MZ Irshad, NC Mithun, Z Seymour… - 2022 26th …, 2022 - ieeexplore.ieee.org
This paper presents a novel approach for the Vision-and-Language Navigation (VLN) task in
continuous 3D environments, which requires an autonomous agent to follow natural …

Iterative vision-and-language navigation

J Krantz, S Banerjee, W Zhu, J Corso… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract We present Iterative Vision-and-Language Navigation (IVLN), a paradigm for
evaluating language-guided agents navigating in a persistent environment over time …

Foam: A follower-aware speaker model for vision-and-language navigation

ZY Dou, N Peng - arXiv preprint arXiv:2206.04294, 2022 - arxiv.org
The speaker-follower models have proven to be effective in vision-and-language navigation,
where a speaker model is used to synthesize new instructions to augment the training data …

Behavioral analysis of vision-and-language navigation agents

Z Yang, A Majumdar, S Lee - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
To be successful, Vision-and-Language Navigation (VLN) agents must be able to ground
instructions to actions based on their surroundings. In this work, we develop a methodology …

CLEAR: Improving vision-language navigation with cross-lingual, environment-agnostic representations

J Li, H Tan, M Bansal - arXiv preprint arXiv:2207.02185, 2022 - arxiv.org
Vision-and-Language Navigation (VLN) tasks require an agent to navigate through the
environment based on language instructions. In this paper, we aim to solve two key …

Define, evaluate, and improve task-oriented cognitive capabilities for instruction generation models

L Zhao, K Nguyen, H Daumé III - arXiv preprint arXiv:2301.05149, 2022 - arxiv.org
Recent work studies the cognitive capabilities of language models through psychological
tests designed for humans. While these studies are helpful for understanding the general …

Graphmapper: Efficient visual navigation by scene graph generation

Z Seymour, NC Mithun, HP Chiu… - 2022 26th …, 2022 - ieeexplore.ieee.org
Understanding the geometric relationships between objects in a scene is a core capability in
enabling both humans and autonomous agents to navigate in new environments. A sparse …