Parameter-efficient fine-tuning for large models: A comprehensive survey
Large models represent a groundbreaking advancement in multiple application fields,
enabling remarkable achievements across various tasks. However, their unprecedented …
enabling remarkable achievements across various tasks. However, their unprecedented …
Multimodal foundation models: From specialists to general-purpose assistants
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …
methods to data compression. Recent advances in statistical machine learning have opened …
Weakly-supervised semantic segmentation with image-level labels: from traditional models to foundation models
The rapid development of deep learning has driven significant progress in image semantic
segmentation—a fundamental task in computer vision. Semantic segmentation algorithms …
segmentation—a fundamental task in computer vision. Semantic segmentation algorithms …
OMG-Seg: Is one model good enough for all segmentation?
In this work we address various segmentation tasks each traditionally tackled by distinct or
partially unified models. We propose OMG-Seg One Model that is Good enough to efficiently …
partially unified models. We propose OMG-Seg One Model that is Good enough to efficiently …
Towards open vocabulary learning: A survey
In the field of visual scene understanding, deep neural networks have made impressive
advancements in various core tasks like segmentation, tracking, and detection. However …
advancements in various core tasks like segmentation, tracking, and detection. However …
Open-vocabulary sam: Segment and recognize twenty-thousand classes interactively
Abstract The CLIP and Segment Anything Model (SAM) are remarkable vision foundation
models (VFMs). SAM excels in segmentation tasks across diverse domains, whereas CLIP is …
models (VFMs). SAM excels in segmentation tasks across diverse domains, whereas CLIP is …
Cat-seg: Cost aggregation for open-vocabulary semantic segmentation
Open-vocabulary semantic segmentation presents the challenge of labeling each pixel
within an image based on a wide range of text descriptions. In this work we introduce a …
within an image based on a wide range of text descriptions. In this work we introduce a …
Osprey: Pixel understanding with visual instruction tuning
Multimodal large language models (MLLMs) have recently achieved impressive general-
purpose vision-language capabilities through visual instruction tuning. However current …
purpose vision-language capabilities through visual instruction tuning. However current …
Remax: Relaxing for better training on efficient panoptic segmentation
This paper presents a new mechanism to facilitate the training of mask transformers for
efficient panoptic segmentation, democratizing its deployment. We observe that due to the …
efficient panoptic segmentation, democratizing its deployment. We observe that due to the …
Sed: A simple encoder-decoder for open-vocabulary semantic segmentation
Open-vocabulary semantic segmentation strives to distinguish pixels into different semantic
groups from an open set of categories. Most existing methods explore utilizing pre-trained …
groups from an open set of categories. Most existing methods explore utilizing pre-trained …