Pointclip: Point cloud understanding by clip

H Zhang, C Wang, S Tian, B Lu, L Zhang, X Ning, X Bai - Displays, 2023 - Elsevier

In recent years, point cloud representation has become one of the research hotspots in the
field of computer vision, and has been widely used in many fields, such as autonomous …

被引用次数：33 相关文章所有 4 个版本

[PDF] arxiv.org

Parameter-efficient fine-tuning for large models: A comprehensive survey

Z Han, C Gao, J Liu, SQ Zhang - arXiv preprint arXiv:2403.14608, 2024 - arxiv.org

Large models represent a groundbreaking advancement in multiple application fields,
enabling remarkable achievements across various tasks. However, their unprecedented …

被引用次数：25 相关文章所有 3 个版本

[PDF] arxiv.org

Llama-adapter: Efficient fine-tuning of language models with zero-init attention

R Zhang, J Han, C Liu, P Gao, A Zhou, X Hu… - arXiv preprint arXiv …, 2023 - arxiv.org

We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA
into an instruction-following model. Using 52K self-instruct demonstrations, LLaMA-Adapter …

被引用次数：428 相关文章所有 4 个版本

[PDF] thecvf.com

Imagebind: One embedding space to bind them all

R Girdhar, A El-Nouby, Z Liu, M Singh… - Proceedings of the …, 2023 - openaccess.thecvf.com

We present ImageBind, an approach to learn a joint embedding across six different
modalities-images, text, audio, depth, thermal, and IMU data. We show that all combinations …

被引用次数：410 相关文章所有 6 个版本

[PDF] thecvf.com

Conditional prompt learning for vision-language models

K Zhou, J Yang, CC Loy, Z Liu - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

With the rise of powerful pre-trained vision-language models like CLIP, it becomes essential
to investigate ways to adapt these models to downstream datasets. A recently proposed …

被引用次数：948 相关文章所有 9 个版本

[PDF] arxiv.org

Llama-adapter v2: Parameter-efficient visual instruction model

P Gao, J Han, R Zhang, Z Lin, S Geng, A Zhou… - arXiv preprint arXiv …, 2023 - arxiv.org

How to efficiently transform large language models (LLMs) into instruction followers is
recently a popular research direction, while training LLM for multi-modal reasoning remains …

被引用次数：337 相关文章所有 4 个版本

[PDF] neurips.cc

Point-m2ae: multi-scale masked autoencoders for hierarchical point cloud pre-training

R Zhang, Z Guo, P Gao, R Fang… - Advances in neural …, 2022 - proceedings.neurips.cc

Masked Autoencoders (MAE) have shown great potentials in self-supervised pre-training for
language and 2D image transformers. However, it still remains an open question on how to …

被引用次数：172 相关文章所有 4 个版本

[PDF] thecvf.com

Ulip: Learning a unified representation of language, images, and point clouds for 3d understanding

L Xue, M Gao, C Xing, R Martín-Martín… - Proceedings of the …, 2023 - openaccess.thecvf.com

The recognition capabilities of current state-of-the-art 3D models are limited by datasets with
a small number of annotated data and a pre-defined set of categories. In its 2D counterpart …

被引用次数：145 相关文章所有 6 个版本

[PDF] arxiv.org

Expanding language-image pretrained models for general video recognition

B Ni, H Peng, M Chen, S Zhang, G Meng, J Fu… - … on Computer Vision, 2022 - Springer

Contrastive language-image pretraining has shown great success in learning visual-textual
joint representation from web-scale data, demonstrating remarkable “zero-shot” …

被引用次数：206 相关文章所有 6 个版本

[PDF] thecvf.com

Pointclip v2: Prompting clip and gpt for powerful 3d open-world learning

X Zhu, R Zhang, B He, Z Guo, Z Zeng… - Proceedings of the …, 2023 - openaccess.thecvf.com

Large-scale pre-trained models have shown promising open-world performance for both
vision and language tasks. However, their transferred capacity on 3D point clouds is still …

被引用次数：122 相关文章所有 5 个版本