Velma: Verbalization embodiment of llm agents for vision and language navigation in street view
Incremental decision making in real-world environments is one of the most challenging tasks
in embodied artificial intelligence. One particularly demanding scenario is Vision and …
in embodied artificial intelligence. One particularly demanding scenario is Vision and …
Loc4plan: Locating before planning for outdoor vision and language navigation
Vision and Language Navigation (VLN) is a challenging task that requires agents to
understand instructions and navigate to the destination in a visual environment. One of the …
understand instructions and navigate to the destination in a visual environment. One of the …
Vln-video: Utilizing driving videos for outdoor vision-and-language navigation
Outdoor Vision-and-Language Navigation (VLN) requires an agent to navigate through
realistic 3D outdoor environments based on natural language instructions. The performance …
realistic 3D outdoor environments based on natural language instructions. The performance …
Flame: Learning to navigate with multimodal llm in urban environments
Large Language Models (LLMs) have demonstrated potential in Vision-and-Language
Navigation (VLN) tasks, yet current applications face challenges. While LLMs excel in …
Navigation (VLN) tasks, yet current applications face challenges. While LLMs excel in …