Deep learning-based 3D point cloud classification: A systematic survey and outlook

H Zhang, C Wang, S Tian, B Lu, L Zhang, X Ning, X Bai - Displays, 2023 - Elsevier
In recent years, point cloud representation has become one of the research hotspots in the
field of computer vision, and has been widely used in many fields, such as autonomous …

Parameter-efficient fine-tuning for large models: A comprehensive survey

Z Han, C Gao, J Liu, SQ Zhang - arXiv preprint arXiv:2403.14608, 2024 - arxiv.org
Large models represent a groundbreaking advancement in multiple application fields,
enabling remarkable achievements across various tasks. However, their unprecedented …

Llama-adapter: Efficient fine-tuning of language models with zero-init attention

R Zhang, J Han, C Liu, P Gao, A Zhou, X Hu… - arXiv preprint arXiv …, 2023 - arxiv.org
We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA
into an instruction-following model. Using 52K self-instruct demonstrations, LLaMA-Adapter …

Imagebind: One embedding space to bind them all

R Girdhar, A El-Nouby, Z Liu, M Singh… - Proceedings of the …, 2023 - openaccess.thecvf.com
We present ImageBind, an approach to learn a joint embedding across six different
modalities-images, text, audio, depth, thermal, and IMU data. We show that all combinations …

Conditional prompt learning for vision-language models

K Zhou, J Yang, CC Loy, Z Liu - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
With the rise of powerful pre-trained vision-language models like CLIP, it becomes essential
to investigate ways to adapt these models to downstream datasets. A recently proposed …

Llama-adapter v2: Parameter-efficient visual instruction model

P Gao, J Han, R Zhang, Z Lin, S Geng, A Zhou… - arXiv preprint arXiv …, 2023 - arxiv.org
How to efficiently transform large language models (LLMs) into instruction followers is
recently a popular research direction, while training LLM for multi-modal reasoning remains …

Point-m2ae: multi-scale masked autoencoders for hierarchical point cloud pre-training

R Zhang, Z Guo, P Gao, R Fang… - Advances in neural …, 2022 - proceedings.neurips.cc
Masked Autoencoders (MAE) have shown great potentials in self-supervised pre-training for
language and 2D image transformers. However, it still remains an open question on how to …

Ulip: Learning a unified representation of language, images, and point clouds for 3d understanding

L Xue, M Gao, C Xing, R Martín-Martín… - Proceedings of the …, 2023 - openaccess.thecvf.com
The recognition capabilities of current state-of-the-art 3D models are limited by datasets with
a small number of annotated data and a pre-defined set of categories. In its 2D counterpart …

Expanding language-image pretrained models for general video recognition

B Ni, H Peng, M Chen, S Zhang, G Meng, J Fu… - … on Computer Vision, 2022 - Springer
Contrastive language-image pretraining has shown great success in learning visual-textual
joint representation from web-scale data, demonstrating remarkable “zero-shot” …

Pointclip v2: Prompting clip and gpt for powerful 3d open-world learning

X Zhu, R Zhang, B He, Z Guo, Z Zeng… - Proceedings of the …, 2023 - openaccess.thecvf.com
Large-scale pre-trained models have shown promising open-world performance for both
vision and language tasks. However, their transferred capacity on 3D point clouds is still …