Point Transformer V3: Simpler Faster Stronger

X Wu, L Jiang, PS Wang, Z Liu, X Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com
This paper is not motivated to seek innovation within the attention mechanism. Instead it
focuses on overcoming the existing trade-offs between accuracy and efficiency within the …

Unipad: A universal pre-training paradigm for autonomous driving

H Yang, S Zhang, D Huang, X Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com
In the context of autonomous driving the significance of effective feature learning is widely
acknowledged. While conventional 3D self-supervised pre-training methods have shown …

OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation

B Peng, X Wu, L Jiang, Y Chen… - Proceedings of the …, 2024 - openaccess.thecvf.com
The booming of 3D recognition in the 2020s began with the introduction of point cloud
transformers. They quickly overwhelmed sparse CNNs and became state-of-the-art models …

Groupcontrast: Semantic-aware self-supervised representation learning for 3d understanding

C Wang, L Jiang, X Wu, Z Tian… - Proceedings of the …, 2024 - openaccess.thecvf.com
Self-supervised 3D representation learning aims to learn effective representations from
large-scale unlabeled point clouds. Most existing approaches adopt point discrimination as …

Skeleton-in-context: Unified skeleton sequence modeling with in-context learning

X Wang, Z Fang, X Li, X Li… - Proceedings of the …, 2024 - openaccess.thecvf.com
In-context learning provides a new perspective for multi-task modeling for vision and NLP.
Under this setting the model can perceive tasks from prompts and accomplish them without …

Ponderv2: Pave the way for 3d foundataion model with a universal pre-training paradigm

H Zhu, H Yang, X Wu, D Huang, S Zhang, X He… - arXiv preprint arXiv …, 2023 - arxiv.org
In contrast to numerous NLP and 2D computer vision foundational models, the learning of a
robust and highly generalized 3D foundational model poses considerably greater …

Multi-Space Alignments Towards Universal LiDAR Segmentation

Y Liu, L Kong, X Wu, R Chen, X Li… - Proceedings of the …, 2024 - openaccess.thecvf.com
A unified and versatile LiDAR segmentation model with strong robustness and
generalizability is desirable for safe autonomous driving perception. This work presents …

UniMODE: Unified Monocular 3D Object Detection

Z Li, X Xu, SN Lim, H Zhao - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
Realizing unified monocular 3D object detection including both indoor and outdoor scenes
holds great importance in applications like robot navigation. However involving various …

SaCo Loss: Sample-wise Affinity Consistency for Vision-Language Pre-training

S Wu, H Tan, Z Tian, Y Chen… - Proceedings of the …, 2024 - openaccess.thecvf.com
Vision-language pre-training (VLP) aims to learn joint representations of vision and
language modalities. The contrastive paradigm is currently dominant in this field. However …

Graph Transformer for 3D point clouds classification and semantic segmentation

W Zhou, Q Wang, W Jin, X Shi, Y He - Computers & Graphics, 2024 - Elsevier
Recently, graph-based and Transformer-based deep learning have demonstrated excellent
performances on various point cloud tasks. Most of the existing graph-based methods rely …