Prompt, generate, then cache: Cascade of foundation models makes strong few-shot learners

H Naveed, AU Khan, S Qiu, M Saqib, S Anwar… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …

被引用次数：365 相关文章所有 3 个版本

[PDF] arxiv.org

Domain specialization as the key to make large language models disruptive: A comprehensive survey

C Ling, X Zhao, J Lu, C Deng, C Zheng, J Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LLMs) have significantly advanced the field of natural language
processing (NLP), providing a highly useful, task-agnostic foundation for a wide range of …

被引用次数：67 相关文章所有 3 个版本

[PDF] arxiv.org

Llama-adapter: Efficient fine-tuning of language models with zero-init attention

R Zhang, J Han, C Liu, P Gao, A Zhou, X Hu… - arXiv preprint arXiv …, 2023 - arxiv.org

We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA
into an instruction-following model. Using 52K self-instruct demonstrations, LLaMA-Adapter …

被引用次数：550 相关文章所有 3 个版本

[PDF] arxiv.org

A survey on multimodal large language models

S Yin, C Fu, S Zhao, K Li, X Sun, T Xu… - arXiv preprint arXiv …, 2023 - arxiv.org

Multimodal Large Language Model (MLLM) recently has been a new rising research
hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform …

被引用次数：708 相关文章所有 6 个版本

[PDF] arxiv.org

Llama-adapter v2: Parameter-efficient visual instruction model

P Gao, J Han, R Zhang, Z Lin, S Geng, A Zhou… - arXiv preprint arXiv …, 2023 - arxiv.org

How to efficiently transform large language models (LLMs) into instruction followers is
recently a popular research direction, while training LLM for multi-modal reasoning remains …

被引用次数：420 相关文章所有 3 个版本

[PDF] thecvf.com

Repurposing diffusion-based image generators for monocular depth estimation

B Ke, A Obukhov, S Huang, N Metzger… - Proceedings of the …, 2024 - openaccess.thecvf.com

Monocular depth estimation is a fundamental computer vision task. Recovering 3D depth
from a single image is geometrically ill-posed and requires scene understanding so it is not …

被引用次数：79 相关文章所有 3 个版本

[PDF] neurips.cc

Graphadapter: Tuning vision-language models with dual knowledge graph

X Li, D Lian, Z Lu, J Bai, Z Chen… - Advances in Neural …, 2024 - proceedings.neurips.cc

Adapter-style efficient transfer learning (ETL) has shown excellent performance in the tuning
of vision-language models (VLMs) under the low-data regime, where only a few additional …

被引用次数：34 相关文章所有 5 个版本

[PDF] thecvf.com

Pointclip v2: Prompting clip and gpt for powerful 3d open-world learning

X Zhu, R Zhang, B He, Z Guo, Z Zeng… - Proceedings of the …, 2023 - openaccess.thecvf.com

Large-scale pre-trained models have shown promising open-world performance for both
vision and language tasks. However, their transferred capacity on 3D point clouds is still …

被引用次数：74 相关文章所有 6 个版本

[PDF] thecvf.com

Sus-x: Training-free name-only transfer of vision-language models

V Udandarao, A Gupta… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Abstract Contrastive Language-Image Pre-training (CLIP) has emerged as a simple yet
effective way to train large-scale vision-language models. CLIP demonstrates impressive …

被引用次数：72 相关文章所有 5 个版本

[PDF] thecvf.com

Not all features matter: Enhancing few-shot clip with adaptive prior refinement

X Zhu, R Zhang, B He, A Zhou… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract The popularity of Contrastive Language-Image Pre-training (CLIP) has propelled its
application to diverse downstream vision tasks. To improve its capacity on downstream …

被引用次数：44 相关文章所有 5 个版本