Conditional prompt learning for vision-language models

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

被引用次数：140 相关文章所有 6 个版本

[PDF] arxiv.org

Domain generalization: A survey

K Zhou, Z Liu, Y Qiao, T Xiang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Generalization to out-of-distribution (OOD) data is a capability natural to humans yet
challenging for machines to reproduce. This is because most learning algorithms strongly …

被引用次数：760 相关文章所有 9 个版本

[PDF] arxiv.org

Llama-adapter: Efficient fine-tuning of language models with zero-init attention

R Zhang, J Han, C Liu, P Gao, A Zhou, X Hu… - arXiv preprint arXiv …, 2023 - arxiv.org

We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA
into an instruction-following model. Using 52K self-instruct demonstrations, LLaMA-Adapter …

被引用次数：427 相关文章所有 4 个版本

[PDF] thecvf.com

Maple: Multi-modal prompt learning

MU Khattak, H Rasheed, M Maaz… - Proceedings of the …, 2023 - openaccess.thecvf.com

Pre-trained vision-language (VL) models such as CLIP have shown excellent generalization
ability to downstream tasks. However, they are sensitive to the choice of input text prompts …

被引用次数：331 相关文章所有 7 个版本

[PDF] thecvf.com

Oneformer: One transformer to rule universal image segmentation

J Jain, J Li, MT Chiu, A Hassani… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract Universal Image Segmentation is not a new concept. Past attempts to unify image
segmentation include scene parsing, panoptic segmentation, and, more recently, new …

被引用次数：194 相关文章所有 7 个版本

[PDF] neurips.cc

Test-time prompt tuning for zero-shot generalization in vision-language models

M Shu, W Nie, DA Huang, Z Yu… - Advances in …, 2022 - proceedings.neurips.cc

Pre-trained vision-language models (eg, CLIP) have shown promising zero-shot
generalization in many downstream tasks with properly designed text prompts. Instead of …

被引用次数：174 相关文章所有 6 个版本

[PDF] arxiv.org

Generalized out-of-distribution detection: A survey

J Yang, K Zhou, Y Li, Z Liu - arXiv preprint arXiv:2110.11334, 2021 - arxiv.org

Out-of-distribution (OOD) detection is critical to ensuring the reliability and safety of machine
learning systems. For instance, in autonomous driving, we would like the driving system to …

被引用次数：677 相关文章所有 4 个版本

[PDF] thecvf.com

Pointclip v2: Prompting clip and gpt for powerful 3d open-world learning

X Zhu, R Zhang, B He, Z Guo, Z Zeng… - Proceedings of the …, 2023 - openaccess.thecvf.com

Large-scale pre-trained models have shown promising open-world performance for both
vision and language tasks. However, their transferred capacity on 3D point clouds is still …

被引用次数：121 相关文章所有 5 个版本

[PDF] arxiv.org

Learning to prompt for vision-language models

K Zhou, J Yang, CC Loy, Z Liu - International Journal of Computer Vision, 2022 - Springer

Large pre-trained vision-language models like CLIP have shown great potential in learning
representations that are transferable across a wide range of downstream tasks. Different …

被引用次数：1498 相关文章所有 10 个版本

[PDF] thecvf.com

Pointclip: Point cloud understanding by clip

R Zhang, Z Guo, W Zhang, K Li… - Proceedings of the …, 2022 - openaccess.thecvf.com

Recently, zero-shot and few-shot learning via Contrastive Vision-Language Pre-training
(CLIP) have shown inspirational performance on 2D visual recognition, which learns to …

被引用次数：307 相关文章所有 7 个版本