An interactive agent foundation model

Z Durante, B Sarkar, R Gong, R Taori, Y Noda… - arXiv preprint arXiv …, 2024 - arxiv.org
The development of artificial intelligence systems is transitioning from creating static, task-
specific models to dynamic, agent-based systems capable of performing well in a wide …

Prioritized semantic learning for zero-shot instance navigation

X Sun, L Liu, H Zhi, R Qiu, J Liang - European Conference on Computer …, 2025 - Springer
We study zero-shot instance navigation, in which the agent navigates to a specific object
without using object annotations for training. Previous object navigation approaches apply …

Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation

J Chen, B Lin, X Liu, L Ma, X Liang… - arXiv preprint arXiv …, 2024 - arxiv.org
LLM-based agents have demonstrated impressive zero-shot performance in vision-
language navigation (VLN) task. However, existing LLM-based methods often focus only on …

Ovexp: Open vocabulary exploration for object-oriented navigation

M Wei, T Wang, Y Chen, H Wang, J Pang… - arXiv preprint arXiv …, 2024 - arxiv.org
Object-oriented embodied navigation aims to locate specific objects, defined by category or
depicted in images. Existing methods often struggle to generalize to open vocabulary goals …

Manipvqa: Injecting robotic affordance and physically grounded information into multi-modal large language models

S Huang, I Ponomarenko, Z Jiang, X Li, X Hu… - arXiv preprint arXiv …, 2024 - arxiv.org
The integration of Multimodal Large Language Models (MLLMs) with robotic systems has
significantly enhanced the ability of robots to interpret and act upon natural language …

RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis

Y Mu, J Chen, Q Zhang, S Chen, Q Yu, C Ge… - arXiv preprint arXiv …, 2024 - arxiv.org
Robotic behavior synthesis, the problem of understanding multimodal inputs and generating
precise physical control for robots, is an important part of Embodied AI. Despite successes in …

Voronav: Voronoi-based zero-shot object navigation with large language model

P Wu, Y Mu, B Wu, Y Hou, J Ma, S Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
In the realm of household robotics, the Zero-Shot Object Navigation (ZSON) task empowers
agents to adeptly traverse unfamiliar environments and locate objects from novel categories …

GAMap: Zero-Shot Object Goal Navigation with Multi-Scale Geometric-Affordance Guidance

S Yuan, H Huang, Y Hao, C Wen, A Tzes… - arXiv preprint arXiv …, 2024 - arxiv.org
Zero-Shot Object Goal Navigation (ZS-OGN) enables robots or agents to navigate toward
objects of unseen categories without object-specific training. Traditional approaches often …

Cog-GA: A Large Language Models-based Generative Agent for Vision-Language Navigation in Continuous Environments

Z Li, Y Lu, Y Mu, H Qiao - arXiv preprint arXiv:2409.02522, 2024 - arxiv.org
Vision Language Navigation in Continuous Environments (VLN-CE) represents a frontier in
embodied AI, demanding agents to navigate freely in unbounded 3D spaces solely guided …

Exploring the reliability of foundation model-based frontier selection in zero-shot object goal navigation

S Yuan, HU Unlu, H Huang, C Wen, A Tzes… - … Conference on Pattern …, 2025 - Springer
In this paper, we present a novel method for reliable frontier selection in Zero-Shot Object
Goal Navigation (ZS-OGN), enhancing robotic navigation systems with foundation models to …