Scenefun3d: Fine-grained functionality and affordance understanding in 3d scenes

A Delitzas, A Takmaz, F Tombari… - Proceedings of the …, 2024 - openaccess.thecvf.com
Existing 3D scene understanding methods are heavily focused on 3D semantic and instance
segmentation. However identifying objects and their parts only constitutes an intermediate …

Segment3d: Learning fine-grained class-agnostic 3d segmentation without manual labels

R Huang, S Peng, A Takmaz, F Tombari… - … on Computer Vision, 2025 - Springer
Current 3D scene segmentation methods are heavily dependent on manually annotated 3D
training datasets. Such manual annotations are labor-intensive, and often lack fine-grained …

OpenNeRF: Open Set 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views

F Engelmann, F Manhardt, M Niemeyer… - arXiv preprint arXiv …, 2024 - arxiv.org
Large visual-language models (VLMs), like CLIP, enable open-set image segmentation to
segment arbitrary concepts from an image in a zero-shot manner. This goes beyond the …

OrbitGrasp: -Equivariant Grasp Learning

B Hu, X Zhu, D Wang, Z Dong, H Huang… - arXiv preprint arXiv …, 2024 - arxiv.org
While grasp detection is an important part of any robotic manipulation pipeline, reliable and
accurate grasp detection in $ SE (3) $ remains a research challenge. Many robotics …

SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs

Y Miao, F Engelmann, O Vysotska, F Tombari… - … on Computer Vision, 2025 - Springer
We introduce the task of localizing an input image within a multi-modal reference map
represented by a collection of 3D scene graphs. These scene graphs comprise multiple …

P2P-Bridge: Diffusion Bridges for 3D Point Cloud Denoising

M Vogel, K Tateno, M Pollefeys, F Tombari… - … on Computer Vision, 2025 - Springer
In this work, we address the task of point cloud denoising using a novel framework adapting
Diffusion Schrödinger bridges to unstructured data like point sets. Unlike previous works that …

Spot-Compose: A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds

O Lemke, Z Bauer, R Zurbrügg, M Pollefeys… - arXiv preprint arXiv …, 2024 - arxiv.org
In recent years, modern techniques in deep learning and large-scale datasets have led to
impressive progress in 3D instance segmentation, grasp pose estimation, and robotics. This …

TARGO: Benchmarking Target-driven Object Grasping under Occlusions

Y Xia, R Ding, Z Qin, G Zhan, K Zhou, L Yang… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advances in predicting 6D grasp poses from a single depth image have led to
promising performance in robotic grasping. However, previous grasping models face …

OpenDAS: Domain Adaptation for Open-Vocabulary Segmentation

G Yilmaz, S Peng, F Engelmann, M Pollefeys… - arXiv preprint arXiv …, 2024 - arxiv.org
The advent of Vision Language Models (VLMs) transformed image understanding from
closed-set classifications to dynamic image-language interactions, enabling open …