- 学术资源搜索

A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends

J Gui, T Chen, J Zhang, Q Cao, Z Sun… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Deep supervised learning algorithms typically require a large volume of labeled data to
achieve satisfactory performance. However, the process of collecting and labeling such data …

被引用次数：54 相关文章所有 3 个版本

[PDF] thecvf.com

Ulip-2: Towards scalable multimodal pre-training for 3d understanding

L Xue, N Yu, S Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Recent advancements in multimodal pre-training have shown promising efficacy in 3D
representation learning by aligning multimodal features across 3D shapes their 2D …

被引用次数：70 相关文章所有 3 个版本

[PDF] neurips.cc

Openshape: Scaling up 3d shape representation towards open-world understanding

M Liu, R Shi, K Kuang, Y Zhu, X Li… - Advances in neural …, 2024 - proceedings.neurips.cc

We introduce OpenShape, a method for learning multi-modal joint representations of text,
image, and point clouds. We adopt the commonly used multi-modal contrastive learning …

被引用次数：61 相关文章所有 5 个版本

[PDF] arxiv.org

Dreamllm: Synergistic multimodal comprehension and creation

R Dong, C Han, Y Peng, Z Qi, Z Ge, J Yang… - arXiv preprint arXiv …, 2023 - arxiv.org

This paper presents DreamLLM, a learning framework that first achieves versatile
Multimodal Large Language Models (MLLMs) empowered with frequently overlooked …

被引用次数：67 相关文章所有 4 个版本

[PDF] neurips.cc

Pointgpt: Auto-regressively generative pre-training from point clouds

G Chen, M Wang, Y Yang, K Yu… - Advances in Neural …, 2024 - proceedings.neurips.cc

Large language models (LLMs) based on the generative pre-training transformer (GPT)
have demonstrated remarkable effectiveness across a diverse range of downstream tasks …

被引用次数：52 相关文章所有 5 个版本

[PDF] arxiv.org

Point-bind & point-llm: Aligning point cloud with multi-modality for 3d understanding, generation, and instruction following

Z Guo, R Zhang, X Zhu, Y Tang, X Ma, J Han… - arXiv preprint arXiv …, 2023 - arxiv.org

We introduce Point-Bind, a 3D multi-modality model aligning point clouds with 2D image,
language, audio, and video. Guided by ImageBind, we construct a joint embedding space …

被引用次数：58 相关文章所有 3 个版本

[HTML] sciencedirect.com

[HTML][HTML] DILF: Differentiable rendering-based multi-view Image–Language Fusion for zero-shot 3D shape understanding

X Ning, Z Yu, L Li, W Li, P Tiwari - Information Fusion, 2024 - Elsevier

Zero-shot 3D shape understanding aims to recognize “unseen” 3D categories that are not
present in training data. Recently, Contrastive Language–Image Pre-training (CLIP) has …

被引用次数：31 相关文章所有 5 个版本

[PDF] thecvf.com

Clip-fo3d: Learning free open-world 3d scene representations from 2d dense clip

J Zhang, R Dong, K Ma - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

Training a 3D scene understanding model requires complicated human annotations, which
are laborious to collect and result in a model only encoding close-set object semantics. In …

被引用次数：44 相关文章所有 6 个版本

[PDF] arxiv.org

Pointmamba: A simple state space model for point cloud analysis

D Liang, X Zhou, X Wang, X Zhu, W Xu, Z Zou… - arXiv preprint arXiv …, 2024 - arxiv.org

Transformers have become one of the foundational architectures in point cloud analysis
tasks due to their excellent global modeling ability. However, the attention mechanism has …

被引用次数：55 相关文章所有 2 个版本

[PDF] arxiv.org

Uni3d: Exploring unified 3d representation at scale

J Zhou, J Wang, B Ma, YS Liu, T Huang… - arXiv preprint arXiv …, 2023 - arxiv.org

Scaling up representations for images or text has been extensively investigated in the past
few years and has led to revolutions in learning vision and language. However, scalable …

被引用次数：36 相关文章所有 3 个版本