A review of hybrid and ensemble in deep learning for natural language processing

J Jia, W Liang, Y Liang - arXiv preprint arXiv:2312.05589, 2023 - arxiv.org
This review presents a comprehensive exploration of hybrid and ensemble deep learning
models within Natural Language Processing (NLP), shedding light on their transformative …

3d-llm: Injecting the 3d world into large language models

Y Hong, H Zhen, P Chen, S Zheng… - Advances in …, 2023 - proceedings.neurips.cc
Large language models (LLMs) and Vision-Language Models (VLMs) have been proved to
excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be …

Where are we in the search for an artificial visual cortex for embodied intelligence?

A Majumdar, K Yadav, S Arnaud, J Ma… - Advances in …, 2024 - proceedings.neurips.cc
We present the largest and most comprehensive empirical study of pre-trained visual
representations (PVRs) or visual 'foundation models' for Embodied AI. First, we curate …

Navigating to objects in the real world

T Gervet, S Chintala, D Batra, J Malik, DS Chaplot - Science Robotics, 2023 - science.org
Semantic navigation is necessary to deploy mobile robots in uncontrolled environments
such as homes or hospitals. Many learning-based approaches have been proposed in …

Openeqa: Embodied question answering in the era of foundation models

A Majumdar, A Ajay, X Zhang, P Putta… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present a modern formulation of Embodied Question Answering (EQA) as the task of
understanding an environment well enough to answer questions about it in natural …

Clip-fields: Weakly supervised semantic fields for robotic memory

NMM Shafiullah, C Paxton, L Pinto, S Chintala… - arXiv preprint arXiv …, 2022 - arxiv.org
We propose CLIP-Fields, an implicit scene model that can be used for a variety of tasks,
such as segmentation, instance identification, semantic search over space, and view …

3d concept learning and reasoning from multi-view images

Y Hong, C Lin, Y Du, Z Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Humans are able to accurately reason in 3D by gathering multi-view observations of the
surrounding world. Inspired by this insight, we introduce a new large-scale benchmark for …

Embodiedscan: A holistic multi-modal 3d perception suite towards embodied ai

T Wang, X Mao, C Zhu, R Xu, R Lyu… - Proceedings of the …, 2024 - openaccess.thecvf.com
In the realm of computer vision and robotics embodied agents are expected to explore their
environment and carry out human instructions. This necessitates the ability to fully …

Pirlnav: Pretraining with imitation and rl finetuning for objectnav

R Ramrakhya, D Batra, E Wijmans… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract We study ObjectGoal Navigation--where a virtual robot situated in a new
environment is asked to navigate to an object. Prior work has shown that imitation learning …

Navigating to objects specified by images

J Krantz, T Gervet, K Yadav, A Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Images are a convenient way to specify which particular object instance an embodied agent
should navigate to. Solving this task requires semantic visual reasoning and exploration of …