Transformer-based visual segmentation: A survey
Visual segmentation seeks to partition images, video frames, or point clouds into multiple
segments or groups. This technique has numerous real-world applications, such as …
segments or groups. This technique has numerous real-world applications, such as …
Segment and caption anything
We propose a method to efficiently equip the Segment Anything Model (SAM) with the ability
to generate regional captions. SAM presents strong generalizability to segment anything …
to generate regional captions. SAM presents strong generalizability to segment anything …
Sclip: Rethinking self-attention for dense vision-language inference
Recent advances in contrastive language-image pretraining (CLIP) have demonstrated
strong capabilities in zero-shot classification by aligning visual representations with target …
strong capabilities in zero-shot classification by aligning visual representations with target …
Panoptic vision-language feature fields
Recently, methods have been proposed for 3D open-vocabulary semantic segmentation.
Such methods are able to segment scenes into arbitrary classes based on text descriptions …
Such methods are able to segment scenes into arbitrary classes based on text descriptions …
TMCFN: Text-supervised multidimensional contrastive fusion network for hyperspectral and LiDAR classification
The joint classification of hyperspectral images (HSIs) and LiDAR data plays a crucial role in
Earth observation missions. Most advanced methods are based on discrete label …
Earth observation missions. Most advanced methods are based on discrete label …
Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation
JJ Wu, ACH Chang, CY Chuang… - Proceedings of the …, 2024 - openaccess.thecvf.com
This paper addresses text-supervised semantic segmentation aiming to learn a model
capable of segmenting arbitrary visual concepts within images by using only image-text …
capable of segmenting arbitrary visual concepts within images by using only image-text …
Self-guided open-vocabulary semantic segmentation
Vision-Language Models (VLMs) have emerged as promising tools for open-ended image
understanding tasks, including open vocabulary segmentation. Yet, direct application of …
understanding tasks, including open vocabulary segmentation. Yet, direct application of …
Image-to-Image Matching via Foundation Models: A New Perspective for Open-Vocabulary Semantic Segmentation
Open-vocabulary semantic segmentation (OVS) aims to segment images of arbitrary
categories specified by class labels or captions. However most previous best-performing …
categories specified by class labels or captions. However most previous best-performing …
Tagalign: Improving vision-language alignment with multi-tag classification
The crux of learning vision-language models is to extract semantically aligned information
from visual and linguistic data. Existing attempts usually face the problem of coarse …
from visual and linguistic data. Existing attempts usually face the problem of coarse …
Multi-modal recursive prompt learning with mixup embedding for generalization recognition
The contrastive language-image pretraining (CLIP) model has shown promise in
generalization recognition by combining visual and textual embeddings. However, the …
generalization recognition by combining visual and textual embeddings. However, the …