Advancing 3D point cloud understanding through deep transfer learning: A comprehensive survey
The 3D point cloud (3DPC) has significantly evolved and benefited from the advance of
deep learning (DL). However, the latter faces various issues, including the lack of data or …
deep learning (DL). However, the latter faces various issues, including the lack of data or …
Video description: A comprehensive survey of deep learning approaches
Video description refers to understanding visual content and transforming that acquired
understanding into automatic textual narration. It bridges the key AI fields of computer vision …
understanding into automatic textual narration. It bridges the key AI fields of computer vision …
2dpass: 2d priors assisted semantic segmentation on lidar point clouds
As camera and LiDAR sensors capture complementary information in autonomous driving,
great efforts have been made to conduct semantic segmentation through multi-modality data …
great efforts have been made to conduct semantic segmentation through multi-modality data …
Sceneverse: Scaling 3d vision-language learning for grounded scene understanding
Abstract 3D vision-language (3D-VL) grounding, which aims to align language with 3D
physical environments, stands as a cornerstone in developing embodied agents. In …
physical environments, stands as a cornerstone in developing embodied agents. In …
Point-bind & point-llm: Aligning point cloud with multi-modality for 3d understanding, generation, and instruction following
We introduce Point-Bind, a 3D multi-modality model aligning point clouds with 2D image,
language, audio, and video. Guided by ImageBind, we construct a joint embedding space …
language, audio, and video. Guided by ImageBind, we construct a joint embedding space …
Eda: Explicit text-decoupling and dense alignment for 3d visual grounding
Abstract 3D visual grounding aims to find the object within point clouds mentioned by free-
form natural language descriptions with rich semantic cues. However, existing methods …
form natural language descriptions with rich semantic cues. However, existing methods …
Context-aware alignment and mutual masking for 3d-language pre-training
Abstract 3D visual language reasoning plays an important role in effective human-computer
interaction. The current approaches for 3D visual reasoning are task-specific, and lack pre …
interaction. The current approaches for 3D visual reasoning are task-specific, and lack pre …
Unit3d: A unified transformer for 3d dense captioning and visual grounding
Performing 3D dense captioning and visual grounding requires a common and shared
understanding of the underlying multimodal relationships. However, despite some previous …
understanding of the underlying multimodal relationships. However, despite some previous …
Vl-sat: Visual-linguistic semantics assisted training for 3d semantic scene graph prediction in point cloud
The task of 3D semantic scene graph (3DSSG) prediction in the point cloud is challenging
since (1) the 3D point cloud only captures geometric structures with limited semantics …
since (1) the 3D point cloud only captures geometric structures with limited semantics …
End-to-end 3d dense captioning with vote2cap-detr
Abstract 3D dense captioning aims to generate multiple captions localized with their
associated object regions. Existing methods follow a sophisticated" detect-then-describe" …
associated object regions. Existing methods follow a sophisticated" detect-then-describe" …