[PDF][PDF] Vima: General robot manipulation with multimodal prompts
Prompt-based learning has emerged as a successful paradigm in natural language
processing, where a single general-purpose language model can be instructed to perform …
processing, where a single general-purpose language model can be instructed to perform …
Navgpt: Explicit reasoning in vision-and-language navigation with large language models
Trained with an unprecedented scale of data, large language models (LLMs) like ChatGPT
and GPT-4 exhibit the emergence of significant reasoning abilities from model scaling. Such …
and GPT-4 exhibit the emergence of significant reasoning abilities from model scaling. Such …
Scaling data generation in vision-and-language navigation
Recent research in language-guided visual navigation has demonstrated a significant
demand for the diversity of traversable environments and the quantity of supervision for …
demand for the diversity of traversable environments and the quantity of supervision for …
Panogen: Text-conditioned panoramic environment generation for vision-and-language navigation
Abstract Vision-and-Language Navigation requires the agent to follow language instructions
to navigate through 3D environments. One main challenge in Vision-and-Language …
to navigate through 3D environments. One main challenge in Vision-and-Language …
Lana: A language-capable navigator for instruction following and generation
Recently, visual-language navigation (VLN)--entailing robot agents to follow navigation
instructions--has shown great advance. However, existing literature put most emphasis on …
instructions--has shown great advance. However, existing literature put most emphasis on …
Discuss before moving: Visual language navigation via multi-expert discussions
Visual language navigation (VLN) is an embodied task demanding a wide range of skills
encompassing understanding, perception, and planning. For such a multifaceted challenge …
encompassing understanding, perception, and planning. For such a multifaceted challenge …
Bevbert: Multimodal map pre-training for language-guided navigation
Large-scale pre-training has shown promising results on the vision-and-language
navigation (VLN) task. However, most existing pre-training methods employ discrete …
navigation (VLN) task. However, most existing pre-training methods employ discrete …
Learning vision-and-language navigation from youtube videos
K Lin, P Chen, D Huang, TH Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
Vision-and-language navigation (VLN) requires an embodied agent to navigate in realistic
3D environments using natural language instructions. Existing VLN methods suffer from …
3D environments using natural language instructions. Existing VLN methods suffer from …
Etpnav: Evolving topological planning for vision-language navigation in continuous environments
Vision-language navigation is a task that requires an agent to follow instructions to navigate
in environments. It becomes increasingly crucial in the field of embodied AI, with potential …
in environments. It becomes increasingly crucial in the field of embodied AI, with potential …
Object-goal visual navigation via effective exploration of relations among historical navigation states
Object-goal visual navigation aims at steering an agent toward an object via a series of
moving steps. Previous works mainly focus on learning informative visual representations for …
moving steps. Previous works mainly focus on learning informative visual representations for …