Point-m2ae: multi-scale masked autoencoders for hierarchical point cloud pre-training

A Xiao, J Huang, D Guan, X Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Point cloud data have been widely explored due to its superior accuracy and robustness
under various adverse situations. Meanwhile, deep neural networks (DNNs) have achieved …

被引用次数：109 相关文章所有 10 个版本

[PDF] thecvf.com

Prompt, generate, then cache: Cascade of foundation models makes strong few-shot learners

R Zhang, X Hu, B Li, S Huang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Visual recognition in low-data regimes requires deep neural networks to learn generalized
representations from limited training samples. Recently, CLIP-based methods have shown …

被引用次数：141 相关文章所有 5 个版本

[PDF] arxiv.org

Tip-adapter: Training-free adaption of clip for few-shot classification

R Zhang, W Zhang, R Fang, P Gao, K Li, J Dai… - European conference on …, 2022 - Springer

Abstract Contrastive Vision-Language Pre-training, known as CLIP, has provided a new
paradigm for learning visual representations using large-scale image-text pairs. It shows …

被引用次数：270 相关文章所有 6 个版本

[PDF] thecvf.com

Learning 3d representations from 2d pre-trained models via image-to-point masked autoencoders

R Zhang, L Wang, Y Qiao, P Gao… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Pre-training by numerous image data has become de-facto for robust 2D representations. In
contrast, due to the expensive data processing, a paucity of 3D datasets severely hinders …

被引用次数：116 相关文章所有 5 个版本

[PDF] thecvf.com

Ulip-2: Towards scalable multimodal pre-training for 3d understanding

L Xue, N Yu, S Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Recent advancements in multimodal pre-training have shown promising efficacy in 3D
representation learning by aligning multimodal features across 3D shapes their 2D …

被引用次数：77 相关文章所有 3 个版本

[PDF] mlr.press

Contrast with reconstruct: Contrastive 3d representation learning guided by generative pretraining

Z Qi, R Dong, G Fan, Z Ge, X Zhang… - … on Machine Learning, 2023 - proceedings.mlr.press

Mainstream 3D representation learning approaches are built upon contrastive or generative
modeling pretext tasks, where great improvements in performance on various downstream …

被引用次数：93 相关文章所有 6 个版本

[PDF] thecvf.com

CLIP2: Contrastive language-image-point pretraining from real-world point cloud data

Y Zeng, C Jiang, J Mao, J Han, C Ye… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract Contrastive Language-Image Pre-training, benefiting from large-scale unlabeled
text-image pairs, has demonstrated great performance in open-world vision understanding …

被引用次数：72 相关文章所有 5 个版本

[PDF] thecvf.com

Pimae: Point cloud and image interactive masked autoencoders for 3d object detection

A Chen, K Zhang, R Zhang, Z Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Masked Autoencoders learn strong visual representations and achieve state-of-the-art
results in several independent modalities, yet very few works have addressed their …

被引用次数：65 相关文章所有 7 个版本

[PDF] ieee.org

Transformer-based visual segmentation: A survey

X Li, H Ding, H Yuan, W Zhang, J Pang… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Visual segmentation seeks to partition images, video frames, or point clouds into multiple
segments or groups. This technique has numerous real-world applications, such as …

被引用次数：72 相关文章所有 3 个版本

[PDF] arxiv.org

Advancing 3D point cloud understanding through deep transfer learning: A comprehensive survey

SS Sohail, Y Himeur, H Kheddar, A Amira, F Fadli… - Information …, 2024 - Elsevier

The 3D point cloud (3DPC) has significantly evolved and benefited from the advance of
deep learning (DL). However, the latter faces various issues, including the lack of data or …

被引用次数：4 相关文章所有 2 个版本