Clip as rnn: Segment countless visual concepts without training endeavor
Existing open-vocabulary image segmentation methods require a fine-tuning step on mask
labels and/or image-text datasets. Mask labels are labor-intensive which limits the number of …
labels and/or image-text datasets. Mask labels are labor-intensive which limits the number of …
Tagalign: Improving vision-language alignment with multi-tag classification
The crux of learning vision-language models is to extract semantically aligned information
from visual and linguistic data. Existing attempts usually face the problem of coarse …
from visual and linguistic data. Existing attempts usually face the problem of coarse …
Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation
JJ Wu, ACH Chang, CY Chuang… - Proceedings of the …, 2024 - openaccess.thecvf.com
This paper addresses text-supervised semantic segmentation aiming to learn a model
capable of segmenting arbitrary visual concepts within images by using only image-text …
capable of segmenting arbitrary visual concepts within images by using only image-text …
Open-vocabulary segmentation with unpaired mask-text supervision
Contemporary cutting-edge open-vocabulary segmentation approaches commonly rely on
image-mask-text triplets, yet this restricted annotation is labour-intensive and encounters …
image-mask-text triplets, yet this restricted annotation is labour-intensive and encounters …
In defense of lazy visual grounding for open-vocabulary semantic segmentation
We present lazy visual grounding, a two-stage approach of unsupervised object mask
discovery followed by object grounding, for open-vocabulary semantic segmentation. Plenty …
discovery followed by object grounding, for open-vocabulary semantic segmentation. Plenty …
AlignZeg: Mitigating Objective Misalignment for Zero-shot Semantic Segmentation
A serious issue that harms the performance of zero-shot visual recognition is named
objective misalignment, ie, the learning objective prioritizes improving the recognition …
objective misalignment, ie, the learning objective prioritizes improving the recognition …
Generalization Boosted Adapter for Open-Vocabulary Segmentation
Vision-language models (VLMs) have demonstrated remarkable open-vocabulary object
recognition capabilities, motivating their adaptation for dense prediction tasks like …
recognition capabilities, motivating their adaptation for dense prediction tasks like …
Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation
The pre-trained vision-language model, exemplified by CLIP, advances zero-shot semantic
segmentation by aligning visual features with class embeddings through a transformer …
segmentation by aligning visual features with class embeddings through a transformer …
Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation
Despite the significant progress in deep learning for dense visual recognition problems,
such as semantic segmentation, traditional methods are constrained by fixed class sets …
such as semantic segmentation, traditional methods are constrained by fixed class sets …
Multi-modal prototypes for open-world semantic segmentation
In semantic segmentation, generalizing a visual system to both seen categories and novel
categories at inference time has always been practically valuable yet challenging. To enable …
categories at inference time has always been practically valuable yet challenging. To enable …