Convolutions die hard: Open-vocabulary segmentation with single frozen convolutional clip
Open-vocabulary segmentation is a challenging task requiring segmenting and recognizing
objects from an open set of categories in diverse environments. One way to address this …
objects from an open set of categories in diverse environments. One way to address this …
Oneformer: One transformer to rule universal image segmentation
Abstract Universal Image Segmentation is not a new concept. Past attempts to unify image
segmentation include scene parsing, panoptic segmentation, and, more recently, new …
segmentation include scene parsing, panoptic segmentation, and, more recently, new …
Detrs with hybrid matching
One-to-one set matching is a key design for DETR to establish its end-to-end capability, so
that object detection does not require a hand-crafted NMS (non-maximum suppression) to …
that object detection does not require a hand-crafted NMS (non-maximum suppression) to …
A generalist framework for panoptic segmentation of images and videos
Panoptic segmentation assigns semantic and instance ID labels to every pixel of an image.
As permutations of instance IDs are also valid solutions, the task requires learning of high …
As permutations of instance IDs are also valid solutions, the task requires learning of high …
Transformer meets remote sensing video detection and tracking: A comprehensive survey
Transformer has shown excellent performance in remote sensing field with long-range
modeling capabilities. Remote sensing video (RSV) moving object detection and tracking …
modeling capabilities. Remote sensing video (RSV) moving object detection and tracking …
Rank-DETR for high quality object detection
Modern detection transformers (DETRs) use a set of object queries to predict a list of
bounding boxes, sort them by their classification confidence scores, and select the top …
bounding boxes, sort them by their classification confidence scores, and select the top …
Clustseg: Clustering for universal segmentation
We present CLUSTSEG, a general, transformer-based framework that tackles different
image segmentation tasks (ie, superpixel, semantic, instance, and panoptic) through a …
image segmentation tasks (ie, superpixel, semantic, instance, and panoptic) through a …
Clusterfomer: clustering as a universal visual learner
This paper presents ClusterFormer, a universal vision model that is based on the Clustering
paradigm with TransFormer. It comprises two novel designs: 1) recurrent cross-attention …
paradigm with TransFormer. It comprises two novel designs: 1) recurrent cross-attention …
Transformer-based visual segmentation: A survey
Visual segmentation seeks to partition images, video frames, or point clouds into multiple
segments or groups. This technique has numerous real-world applications, such as …
segments or groups. This technique has numerous real-world applications, such as …
Fastinst: A simple query-based model for real-time instance segmentation
Recent attention in instance segmentation has focused on query-based models. Despite
being non-maximum suppression (NMS)-free and end-to-end, the superiority of these …
being non-maximum suppression (NMS)-free and end-to-end, the superiority of these …