A comprehensive survey of few-shot learning: Evolution, applications, challenges, and opportunities
Few-shot learning (FSL) has emerged as an effective learning method and shows great
potential. Despite the recent creative works in tackling FSL tasks, learning valid information …
potential. Despite the recent creative works in tackling FSL tasks, learning valid information …
[HTML][HTML] Review of large vision models and visual prompt engineering
Visual prompt engineering is a fundamental methodology in the field of visual and image
artificial general intelligence. As the development of large vision models progresses, the …
artificial general intelligence. As the development of large vision models progresses, the …
An image is worth one word: Personalizing text-to-image generation using textual inversion
Text-to-image models offer unprecedented freedom to guide creation through natural
language. Yet, it is unclear how such freedom can be exercised to generate images of …
language. Yet, it is unclear how such freedom can be exercised to generate images of …
Llama-adapter: Efficient fine-tuning of language models with zero-init attention
We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA
into an instruction-following model. Using 52K self-instruct demonstrations, LLaMA-Adapter …
into an instruction-following model. Using 52K self-instruct demonstrations, LLaMA-Adapter …
Maple: Multi-modal prompt learning
Pre-trained vision-language (VL) models such as CLIP have shown excellent generalization
ability to downstream tasks. However, they are sensitive to the choice of input text prompts …
ability to downstream tasks. However, they are sensitive to the choice of input text prompts …
Llama-adapter v2: Parameter-efficient visual instruction model
How to efficiently transform large language models (LLMs) into instruction followers is
recently a popular research direction, while training LLM for multi-modal reasoning remains …
recently a popular research direction, while training LLM for multi-modal reasoning remains …
Open-vocabulary semantic segmentation with mask-adapted clip
Open-vocabulary semantic segmentation aims to segment an image into semantic regions
according to text descriptions, which may not have been seen during training. Recent two …
according to text descriptions, which may not have been seen during training. Recent two …
Visual prompt tuning
The current modus operandi in adapting pre-trained models involves updating all the
backbone parameters, ie., full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) …
backbone parameters, ie., full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) …
Oneformer: One transformer to rule universal image segmentation
Abstract Universal Image Segmentation is not a new concept. Past attempts to unify image
segmentation include scene parsing, panoptic segmentation, and, more recently, new …
segmentation include scene parsing, panoptic segmentation, and, more recently, new …
Generating images with multimodal language models
We propose a method to fuse frozen text-only large language models (LLMs) with pre-
trained image encoder and decoder models, by mapping between their embedding spaces …
trained image encoder and decoder models, by mapping between their embedding spaces …