Unsupervised point cloud representation learning with deep neural networks: A survey
Point cloud data have been widely explored due to its superior accuracy and robustness
under various adverse situations. Meanwhile, deep neural networks (DNNs) have achieved …
under various adverse situations. Meanwhile, deep neural networks (DNNs) have achieved …
Prompt, generate, then cache: Cascade of foundation models makes strong few-shot learners
Visual recognition in low-data regimes requires deep neural networks to learn generalized
representations from limited training samples. Recently, CLIP-based methods have shown …
representations from limited training samples. Recently, CLIP-based methods have shown …
Tip-adapter: Training-free adaption of clip for few-shot classification
Abstract Contrastive Vision-Language Pre-training, known as CLIP, has provided a new
paradigm for learning visual representations using large-scale image-text pairs. It shows …
paradigm for learning visual representations using large-scale image-text pairs. It shows …
Learning 3d representations from 2d pre-trained models via image-to-point masked autoencoders
Pre-training by numerous image data has become de-facto for robust 2D representations. In
contrast, due to the expensive data processing, a paucity of 3D datasets severely hinders …
contrast, due to the expensive data processing, a paucity of 3D datasets severely hinders …
Ulip-2: Towards scalable multimodal pre-training for 3d understanding
Recent advancements in multimodal pre-training have shown promising efficacy in 3D
representation learning by aligning multimodal features across 3D shapes their 2D …
representation learning by aligning multimodal features across 3D shapes their 2D …
Contrast with reconstruct: Contrastive 3d representation learning guided by generative pretraining
Mainstream 3D representation learning approaches are built upon contrastive or generative
modeling pretext tasks, where great improvements in performance on various downstream …
modeling pretext tasks, where great improvements in performance on various downstream …
CLIP2: Contrastive language-image-point pretraining from real-world point cloud data
Abstract Contrastive Language-Image Pre-training, benefiting from large-scale unlabeled
text-image pairs, has demonstrated great performance in open-world vision understanding …
text-image pairs, has demonstrated great performance in open-world vision understanding …
Pimae: Point cloud and image interactive masked autoencoders for 3d object detection
Masked Autoencoders learn strong visual representations and achieve state-of-the-art
results in several independent modalities, yet very few works have addressed their …
results in several independent modalities, yet very few works have addressed their …
Transformer-based visual segmentation: A survey
Visual segmentation seeks to partition images, video frames, or point clouds into multiple
segments or groups. This technique has numerous real-world applications, such as …
segments or groups. This technique has numerous real-world applications, such as …
Pointclip v2: Prompting clip and gpt for powerful 3d open-world learning
Large-scale pre-trained models have shown promising open-world performance for both
vision and language tasks. However, their transferred capacity on 3D point clouds is still …
vision and language tasks. However, their transferred capacity on 3D point clouds is still …