An interactive agent foundation model
The development of artificial intelligence systems is transitioning from creating static, task-
specific models to dynamic, agent-based systems capable of performing well in a wide …
specific models to dynamic, agent-based systems capable of performing well in a wide …
Prioritized semantic learning for zero-shot instance navigation
We study zero-shot instance navigation, in which the agent navigates to a specific object
without using object annotations for training. Previous object navigation approaches apply …
without using object annotations for training. Previous object navigation approaches apply …
Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation
LLM-based agents have demonstrated impressive zero-shot performance in vision-
language navigation (VLN) task. However, existing LLM-based methods often focus only on …
language navigation (VLN) task. However, existing LLM-based methods often focus only on …
Ovexp: Open vocabulary exploration for object-oriented navigation
Object-oriented embodied navigation aims to locate specific objects, defined by category or
depicted in images. Existing methods often struggle to generalize to open vocabulary goals …
depicted in images. Existing methods often struggle to generalize to open vocabulary goals …
Manipvqa: Injecting robotic affordance and physically grounded information into multi-modal large language models
The integration of Multimodal Large Language Models (MLLMs) with robotic systems has
significantly enhanced the ability of robots to interpret and act upon natural language …
significantly enhanced the ability of robots to interpret and act upon natural language …
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis
Robotic behavior synthesis, the problem of understanding multimodal inputs and generating
precise physical control for robots, is an important part of Embodied AI. Despite successes in …
precise physical control for robots, is an important part of Embodied AI. Despite successes in …
Voronav: Voronoi-based zero-shot object navigation with large language model
In the realm of household robotics, the Zero-Shot Object Navigation (ZSON) task empowers
agents to adeptly traverse unfamiliar environments and locate objects from novel categories …
agents to adeptly traverse unfamiliar environments and locate objects from novel categories …
GAMap: Zero-Shot Object Goal Navigation with Multi-Scale Geometric-Affordance Guidance
Zero-Shot Object Goal Navigation (ZS-OGN) enables robots or agents to navigate toward
objects of unseen categories without object-specific training. Traditional approaches often …
objects of unseen categories without object-specific training. Traditional approaches often …
Cog-GA: A Large Language Models-based Generative Agent for Vision-Language Navigation in Continuous Environments
Vision Language Navigation in Continuous Environments (VLN-CE) represents a frontier in
embodied AI, demanding agents to navigate freely in unbounded 3D spaces solely guided …
embodied AI, demanding agents to navigate freely in unbounded 3D spaces solely guided …
Exploring the reliability of foundation model-based frontier selection in zero-shot object goal navigation
In this paper, we present a novel method for reliable frontier selection in Zero-Shot Object
Goal Navigation (ZS-OGN), enhancing robotic navigation systems with foundation models to …
Goal Navigation (ZS-OGN), enhancing robotic navigation systems with foundation models to …