When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models

X Ma, Y Bhalgat, B Smart, S Chen, X Li, J Ding… - arXiv preprint arXiv …, 2024 - arxiv.org
As large language models (LLMs) evolve, their integration with 3D spatial data (3D-LLMs)
has seen rapid progress, offering unprecedented capabilities for understanding and …

Multi-Task Domain Adaptation for Language Grounding with 3D Objects

P Sun, Y Song, X Pan, P Dong, X Yang, Q Wang… - … on Computer Vision, 2024 - Springer
The existing works on object-level language grounding with 3D objects mostly focus on
improving performance by utilizing the offthe-shelf pre-trained models to capture features …

A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances, and Future Directions

D Liu, Y Liu, W Huang, W Hu - arXiv preprint arXiv:2406.05785, 2024 - arxiv.org
Text-guided 3D visual grounding (T-3DVG), which aims to locate a specific object that
semantically corresponds to a language query from a complicated 3D scene, has drawn …

Reimagining 3D Visual Grounding: Instance Segmentation and Transformers for Fragmented Point Cloud Scenarios

Z Tan, W Yang, Z Wang - Proceedings of the 5th ACM International …, 2023 - dl.acm.org
This work introduces a pioneering, engineerable approach to 3D visual localization (3DVG).
Current challenges for 2D Visual Grounding (2DVG) and 3DVG are summarized: Absence of …