Momentdiff: Generative video moment retrieval from random to real
Video moment retrieval pursues an efficient and generalized solution to identify the specific
temporal segments within an untrimmed video that correspond to a given language …
temporal segments within an untrimmed video that correspond to a given language …
Clustseg: Clustering for universal segmentation
We present CLUSTSEG, a general, transformer-based framework that tackles different
image segmentation tasks (ie, superpixel, semantic, instance, and panoptic) through a …
image segmentation tasks (ie, superpixel, semantic, instance, and panoptic) through a …
Logic-induced diagnostic reasoning for semi-supervised semantic segmentation
Recent advances in semi-supervised semantic segmentation have been heavily reliant on
pseudo labeling to compensate for limited labeled data, disregarding the valuable relational …
pseudo labeling to compensate for limited labeled data, disregarding the valuable relational …
Diffusionret: Generative text-video retrieval with diffusion model
Existing text-video retrieval solutions are, in essence, discriminant models focused on
maximizing the conditional likelihood, ie, p (candidates| query). While straightforward, this …
maximizing the conditional likelihood, ie, p (candidates| query). While straightforward, this …
Generative semantic segmentation
Abstract We present Generative Semantic Segmentation (GSS), a generative learning
approach for semantic segmentation. Uniquely, we cast semantic segmentation as an image …
approach for semantic segmentation. Uniquely, we cast semantic segmentation as an image …
Clustering based point cloud representation learning for 3d analysis
Point cloud analysis (such as 3D segmentation and detection) is a challenging task,
because of not only the irregular geometries of many millions of unordered points, but also …
because of not only the irregular geometries of many millions of unordered points, but also …
Fedseg: Class-heterogeneous federated learning for semantic segmentation
Federated Learning (FL) is a distributed learning paradigm that collaboratively learns a
global model across multiple clients with data privacy-preserving. Although many FL …
global model across multiple clients with data privacy-preserving. Although many FL …
Local-global context aware transformer for language-guided video segmentation
We explore the task of language-guided video segmentation (LVS). Previous algorithms
mostly adopt 3D CNNs to learn video representation, struggling to capture long-term context …
mostly adopt 3D CNNs to learn video representation, struggling to capture long-term context …
Catr: Combinatorial-dependence audio-queried transformer for audio-visual video segmentation
Audio-visual video segmentation (AVVS) aims to generate pixel-level maps of sound-
producing objects within image frames and ensure the maps faithfully adheres to the given …
producing objects within image frames and ensure the maps faithfully adheres to the given …
Sparsely annotated semantic segmentation with adaptive gaussian mixtures
Sparsely annotated semantic segmentation (SASS) aims to learn a segmentation model by
images with sparse labels (ie, points or scribbles). Existing methods mainly focus on …
images with sparse labels (ie, points or scribbles). Existing methods mainly focus on …