Shapellm: Universal 3d object understanding for embodied interaction

Z Qi, R Dong, S Zhang, H Geng, C Han, Z Ge… - … on Computer Vision, 2025 - Springer
This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM)
designed for embodied interaction, exploring a universal 3D object understanding with 3D …

Manipllm: Embodied multimodal large language model for object-centric robotic manipulation

X Li, M Zhang, Y Geng, H Geng… - Proceedings of the …, 2024 - openaccess.thecvf.com
Robot manipulation relies on accurately predicting contact points and end-effector directions
to ensure successful operation. However learning-based robot manipulation trained on a …

Chaineddiffuser: Unifying trajectory diffusion and keypose prediction for robotic manipulation

Z Xian, N Gkanatsios, T Gervet, TW Ke… - … Annual Conference on …, 2023 - openreview.net
We present ChainedDiffuser, a policy architecture that unifies action keypose prediction and
trajectory diffusion generation for learning robot manipulation from demonstrations. Our …

Nap: Neural 3d articulated object prior

J Lei, C Deng, WB Shen, LJ Guibas… - Advances in Neural …, 2023 - proceedings.neurips.cc
Abstract We propose Neural 3D Articulated object Prior (NAP), the first 3D deep generative
model to synthesize 3D articulated object models. Despite the extensive research on …

Robo-abc: Affordance generalization beyond categories via semantic correspondence for robot manipulation

Y Ju, K Hu, G Zhang, G Zhang, M Jiang… - arXiv preprint arXiv …, 2024 - arxiv.org
Enabling robotic manipulation that generalizes to out-of-distribution scenes is a crucial step
toward open-world embodied intelligence. For human beings, this ability is rooted in the …

ARNOLD: A benchmark for language-grounded task learning with continuous states in realistic 3D scenes

R Gong, J Huang, Y Zhao, H Geng… - Proceedings of the …, 2023 - openaccess.thecvf.com
Understanding the continuous states of objects is essential for task learning and planning in
the real world. However, most existing task learning benchmarks assume discrete (eg …

Ram: Retrieval-based affordance transfer for generalizable zero-shot robotic manipulation

Y Kuang, J Ye, H Geng, J Mao, C Deng… - arXiv preprint arXiv …, 2024 - arxiv.org
This work proposes a retrieve-and-transfer framework for zero-shot robotic manipulation,
dubbed RAM, featuring generalizability across various objects, environments, and …

Unidoormanip: Learning universal door manipulation policy over large-scale and diverse door manipulation environments

Y Li, X Zhang, R Wu, Z Zhang, Y Geng, H Dong… - arXiv preprint arXiv …, 2024 - arxiv.org
Learning a universal manipulation policy encompassing doors with diverse categories,
geometries and mechanisms, is crucial for future embodied agents to effectively work in …

Open-vocabulary affordance detection using knowledge distillation and text-point correlation

T Van Vo, MN Vu, B Huang, T Nguyen… - … on Robotics and …, 2024 - ieeexplore.ieee.org
Affordance detection presents intricate challenges and has a wide range of robotic
applications. Previous works have faced limitations such as the complexities of 3D object …

UniGarmentManip: A Unified Framework for Category-Level Garment Manipulation via Dense Visual Correspondence

R Wu, H Lu, Y Wang, Y Wang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Garment manipulation (eg unfolding folding and hanging clothes) is essential for future
robots to accomplish home-assistant tasks while highly challenging due to the diversity of …