Training-free consistent text-to-image generation
Text-to-image models offer a new level of creative flexibility by allowing users to guide the
image generation process through natural language. However, using these models to …
image generation process through natural language. However, using these models to …
Harnessing text-to-image diffusion models for category-agnostic pose estimation
Abstract Category-Agnostic Pose Estimation (CAPE) aims to detect keypoints of an arbitrary
unseen category in images, based on several provided examples of that category. This is a …
unseen category in images, based on several provided examples of that category. This is a …
Diff-tracker: text-to-image diffusion models are unsupervised trackers
Abstract We introduce Diff-Tracker, a novel approach for the challenging unsupervised
visual tracking task leveraging the pre-trained text-to-image diffusion model. Our main idea …
visual tracking task leveraging the pre-trained text-to-image diffusion model. Our main idea …
Class-agnostic object counting with text-to-image diffusion model
X Hui, Q Wu, H Rahmani, J Liu - European Conference on Computer …, 2025 - Springer
Class-agnostic object counting aims to count objects of arbitrary classes with limited
information (eg., a few exemplars or the class names) provided. It requires the model to …
information (eg., a few exemplars or the class names) provided. It requires the model to …
Explore in-context segmentation via latent diffusion models
In-context segmentation has drawn more attention with the introduction of vision foundation
models. Most existing approaches adopt metric learning or masked image modeling to build …
models. Most existing approaches adopt metric learning or masked image modeling to build …
TokenCompose: Text-to-Image Diffusion with Token-level Supervision
Abstract We present TokenCompose a Latent Diffusion Model for text-to-image generation
that achieves enhanced consistency between user-specified text prompts and model …
that achieves enhanced consistency between user-specified text prompts and model …
Detecting Out-Of-Distribution Earth Observation Images with Diffusion Models
G Le Bellier, N Audebert - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Earth Observation imagery can capture rare and unusual events such as disasters and
major landscape changes whose visual appearance contrasts with the usual observations …
major landscape changes whose visual appearance contrasts with the usual observations …
Tokencompose: Grounding diffusion with token-level supervision
We present TokenCompose, a Latent Diffusion Model for text-to-image generation that
achieves enhanced consistency between user-specified text prompts and model-generated …
achieves enhanced consistency between user-specified text prompts and model-generated …
One-shot in-context part segmentation
In this paper, we present the One-shot In-context Part Segmentation (OIParts) framework,
designed to tackle the challenges of part segmentation by leveraging visual foundation …
designed to tackle the challenges of part segmentation by leveraging visual foundation …
There is no SAMantics! Exploring SAM as a Backbone for Visual Understanding Tasks
The Segment Anything Model (SAM) was originally designed for label-agnostic mask
generation. Does this model also possess inherent semantic understanding, of value to …
generation. Does this model also possess inherent semantic understanding, of value to …