Habitat-matterport 3d semantics dataset

J Jia, W Liang, Y Liang - arXiv preprint arXiv:2312.05589, 2023 - arxiv.org

This review presents a comprehensive exploration of hybrid and ensemble deep learning
models within Natural Language Processing (NLP), shedding light on their transformative …

被引用次数：31 相关文章所有 2 个版本

[PDF] neurips.cc

3d-llm: Injecting the 3d world into large language models

Y Hong, H Zhen, P Chen, S Zheng… - Advances in …, 2023 - proceedings.neurips.cc

Large language models (LLMs) and Vision-Language Models (VLMs) have been proved to
excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be …

被引用次数：118 相关文章所有 7 个版本

[PDF] neurips.cc

Where are we in the search for an artificial visual cortex for embodied intelligence?

A Majumdar, K Yadav, S Arnaud, J Ma… - Advances in …, 2024 - proceedings.neurips.cc

We present the largest and most comprehensive empirical study of pre-trained visual
representations (PVRs) or visual 'foundation models' for Embodied AI. First, we curate …

被引用次数：85 相关文章所有 6 个版本

[PDF] arxiv.org

Navigating to objects in the real world

T Gervet, S Chintala, D Batra, J Malik, DS Chaplot - Science Robotics, 2023 - science.org

Semantic navigation is necessary to deploy mobile robots in uncontrolled environments
such as homes or hospitals. Many learning-based approaches have been proposed in …

被引用次数：69 相关文章所有 8 个版本

[PDF] thecvf.com

Openeqa: Embodied question answering in the era of foundation models

A Majumdar, A Ajay, X Zhang, P Putta… - Proceedings of the …, 2024 - openaccess.thecvf.com

We present a modern formulation of Embodied Question Answering (EQA) as the task of
understanding an environment well enough to answer questions about it in natural …

被引用次数：17 相关文章所有 2 个版本

[PDF] arxiv.org

Clip-fields: Weakly supervised semantic fields for robotic memory

NMM Shafiullah, C Paxton, L Pinto, S Chintala… - arXiv preprint arXiv …, 2022 - arxiv.org

We propose CLIP-Fields, an implicit scene model that can be used for a variety of tasks,
such as segmentation, instance identification, semantic search over space, and view …

被引用次数：92 相关文章所有 3 个版本

[PDF] thecvf.com

3d concept learning and reasoning from multi-view images

Y Hong, C Lin, Y Du, Z Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

Humans are able to accurately reason in 3D by gathering multi-view observations of the
surrounding world. Inspired by this insight, we introduce a new large-scale benchmark for …

被引用次数：40 相关文章所有 6 个版本

[PDF] thecvf.com

Embodiedscan: A holistic multi-modal 3d perception suite towards embodied ai

T Wang, X Mao, C Zhu, R Xu, R Lyu… - Proceedings of the …, 2024 - openaccess.thecvf.com

In the realm of computer vision and robotics embodied agents are expected to explore their
environment and carry out human instructions. This necessitates the ability to fully …

被引用次数：14 相关文章所有 4 个版本

[PDF] thecvf.com

Pirlnav: Pretraining with imitation and rl finetuning for objectnav

R Ramrakhya, D Batra, E Wijmans… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract We study ObjectGoal Navigation--where a virtual robot situated in a new
environment is asked to navigate to an object. Prior work has shown that imitation learning …

被引用次数：33 相关文章所有 7 个版本

[PDF] thecvf.com

Navigating to objects specified by images

J Krantz, T Gervet, K Yadav, A Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Images are a convenient way to specify which particular object instance an embodied agent
should navigate to. Solving this task requires semantic visual reasoning and exploration of …

被引用次数：19 相关文章所有 5 个版本