Parameter-efficient fine-tuning for large models: A comprehensive survey

Z Han, C Gao, J Liu, J Zhang, SQ Zhang - arXiv preprint arXiv:2403.14608, 2024 - arxiv.org
Large models represent a groundbreaking advancement in multiple application fields,
enabling remarkable achievements across various tasks. However, their unprecedented …

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

Weakly-supervised semantic segmentation with image-level labels: from traditional models to foundation models

Z Chen, Q Sun - ACM Computing Surveys, 2023 - dl.acm.org
The rapid development of deep learning has driven significant progress in image semantic
segmentation—a fundamental task in computer vision. Semantic segmentation algorithms …

OMG-Seg: Is one model good enough for all segmentation?

X Li, H Yuan, W Li, H Ding, S Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com
In this work we address various segmentation tasks each traditionally tackled by distinct or
partially unified models. We propose OMG-Seg One Model that is Good enough to efficiently …

Towards open vocabulary learning: A survey

J Wu, X Li, S Xu, H Yuan, H Ding… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
In the field of visual scene understanding, deep neural networks have made impressive
advancements in various core tasks like segmentation, tracking, and detection. However …

Open-vocabulary sam: Segment and recognize twenty-thousand classes interactively

H Yuan, X Li, C Zhou, Y Li, K Chen, CC Loy - European Conference on …, 2025 - Springer
Abstract The CLIP and Segment Anything Model (SAM) are remarkable vision foundation
models (VFMs). SAM excels in segmentation tasks across diverse domains, whereas CLIP is …

Cat-seg: Cost aggregation for open-vocabulary semantic segmentation

S Cho, H Shin, S Hong, A Arnab… - Proceedings of the …, 2024 - openaccess.thecvf.com
Open-vocabulary semantic segmentation presents the challenge of labeling each pixel
within an image based on a wide range of text descriptions. In this work we introduce a …

Osprey: Pixel understanding with visual instruction tuning

Y Yuan, W Li, J Liu, D Tang, X Luo… - Proceedings of the …, 2024 - openaccess.thecvf.com
Multimodal large language models (MLLMs) have recently achieved impressive general-
purpose vision-language capabilities through visual instruction tuning. However current …

Remax: Relaxing for better training on efficient panoptic segmentation

S Sun, W Wang, A Howard, Q Yu… - Advances in Neural …, 2024 - proceedings.neurips.cc
This paper presents a new mechanism to facilitate the training of mask transformers for
efficient panoptic segmentation, democratizing its deployment. We observe that due to the …

Sed: A simple encoder-decoder for open-vocabulary semantic segmentation

B Xie, J Cao, J Xie, FS Khan… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Open-vocabulary semantic segmentation strives to distinguish pixels into different semantic
groups from an open set of categories. Most existing methods explore utilizing pre-trained …