Velma: Verbalization embodiment of llm agents for vision and language navigation in street view

R Schumann, W Zhu, W Feng, TJ Fu… - Proceedings of the …, 2024 - ojs.aaai.org
Incremental decision making in real-world environments is one of the most challenging tasks
in embodied artificial intelligence. One particularly demanding scenario is Vision and …

Loc4plan: Locating before planning for outdoor vision and language navigation

H Tian, J Meng, WS Zheng, YM Li, J Yan… - Proceedings of the 32nd …, 2024 - dl.acm.org
Vision and Language Navigation (VLN) is a challenging task that requires agents to
understand instructions and navigate to the destination in a visual environment. One of the …

Vln-video: Utilizing driving videos for outdoor vision-and-language navigation

J Li, A Padmakumar, G Sukhatme… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Outdoor Vision-and-Language Navigation (VLN) requires an agent to navigate through
realistic 3D outdoor environments based on natural language instructions. The performance …

Flame: Learning to navigate with multimodal llm in urban environments

Y Xu, Y Pan, Z Liu, H Wang - arXiv preprint arXiv:2408.11051, 2024 - arxiv.org
Large Language Models (LLMs) have demonstrated potential in Vision-and-Language
Navigation (VLN) tasks, yet current applications face challenges. While LLMs excel in …