General evaluation for instruction conditioned navigation using dynamic time warping

M Deitke, D Batra, Y Bisk, T Campari, AX Chang… - arXiv preprint arXiv …, 2022 - arxiv.org

We present a retrospective on the state of Embodied AI research. Our analysis focuses on
13 challenges presented at the Embodied AI Workshop at CVPR. These challenges are …

被引用次数：50 相关文章所有 2 个版本

[PDF] arxiv.org

Hierarchical cross-modal agent for robotics vision-and-language navigation

MZ Irshad, CY Ma, Z Kira - 2021 IEEE International Conference …, 2021 - ieeexplore.ieee.org

Deep Learning has revolutionized our ability to solve complex problems such as Vision-and-
Language Navigation (VLN). This task requires the agent to navigate to a goal purely based …

被引用次数：54 相关文章所有 5 个版本

[PDF] mlr.press

Interactive learning from activity description

KX Nguyen, D Misra, R Schapire… - International …, 2021 - proceedings.mlr.press

We present a novel interactive learning protocol that enables training request-fulfilling
agents by verbally describing their activities. Unlike imitation learning (IL), our protocol …

被引用次数：41 相关文章所有 9 个版本

[PDF] arxiv.org

Semantically-aware spatio-temporal reasoning agent for vision-and-language navigation in continuous environments

MZ Irshad, NC Mithun, Z Seymour… - 2022 26th …, 2022 - ieeexplore.ieee.org

This paper presents a novel approach for the Vision-and-Language Navigation (VLN) task in
continuous 3D environments, which requires an autonomous agent to follow natural …

被引用次数：35 相关文章所有 5 个版本

[PDF] thecvf.com

Iterative vision-and-language navigation

J Krantz, S Banerjee, W Zhu, J Corso… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract We present Iterative Vision-and-Language Navigation (IVLN), a paradigm for
evaluating language-guided agents navigating in a persistent environment over time …

被引用次数：14 相关文章所有 6 个版本

[PDF] arxiv.org

Foam: A follower-aware speaker model for vision-and-language navigation

ZY Dou, N Peng - arXiv preprint arXiv:2206.04294, 2022 - arxiv.org

The speaker-follower models have proven to be effective in vision-and-language navigation,
where a speaker model is used to synthesize new instructions to augment the training data …

被引用次数：22 相关文章所有 5 个版本

[PDF] thecvf.com

Behavioral analysis of vision-and-language navigation agents

Z Yang, A Majumdar, S Lee - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

To be successful, Vision-and-Language Navigation (VLN) agents must be able to ground
instructions to actions based on their surroundings. In this work, we develop a methodology …

被引用次数：5 相关文章所有 7 个版本

[PDF] arxiv.org

CLEAR: Improving vision-language navigation with cross-lingual, environment-agnostic representations

J Li, H Tan, M Bansal - arXiv preprint arXiv:2207.02185, 2022 - arxiv.org

Vision-and-Language Navigation (VLN) tasks require an agent to navigate through the
environment based on language instructions. In this paper, we aim to solve two key …

被引用次数：12 相关文章所有 4 个版本

[PDF] arxiv.org

Define, evaluate, and improve task-oriented cognitive capabilities for instruction generation models

L Zhao, K Nguyen, H Daumé III - arXiv preprint arXiv:2301.05149, 2022 - arxiv.org

Recent work studies the cognitive capabilities of language models through psychological
tests designed for humans. While these studies are helpful for understanding the general …

被引用次数：5 相关文章所有 5 个版本

[PDF] arxiv.org

Graphmapper: Efficient visual navigation by scene graph generation

Z Seymour, NC Mithun, HP Chiu… - 2022 26th …, 2022 - ieeexplore.ieee.org

Understanding the geometric relationships between objects in a scene is a core capability in
enabling both humans and autonomous agents to navigate in new environments. A sparse …

被引用次数：7 相关文章所有 6 个版本