Training-free consistent text-to-image generation

Y Tewel, O Kaduri, R Gal, Y Kasten, L Wolf… - ACM Transactions on …, 2024 - dl.acm.org
Text-to-image models offer a new level of creative flexibility by allowing users to guide the
image generation process through natural language. However, using these models to …

Harnessing text-to-image diffusion models for category-agnostic pose estimation

D Peng, Z Zhang, P Hu, Q Ke, DKY Yau… - European Conference on …, 2025 - Springer
Abstract Category-Agnostic Pose Estimation (CAPE) aims to detect keypoints of an arbitrary
unseen category in images, based on several provided examples of that category. This is a …

Diff-tracker: text-to-image diffusion models are unsupervised trackers

Z Zhang, L Xu, D Peng, H Rahmani, J Liu - European Conference on …, 2025 - Springer
Abstract We introduce Diff-Tracker, a novel approach for the challenging unsupervised
visual tracking task leveraging the pre-trained text-to-image diffusion model. Our main idea …

Class-agnostic object counting with text-to-image diffusion model

X Hui, Q Wu, H Rahmani, J Liu - European Conference on Computer …, 2025 - Springer
Class-agnostic object counting aims to count objects of arbitrary classes with limited
information (eg., a few exemplars or the class names) provided. It requires the model to …

Explore in-context segmentation via latent diffusion models

C Wang, X Li, H Ding, L Qi, J Zhang, Y Tong… - arXiv preprint arXiv …, 2024 - arxiv.org
In-context segmentation has drawn more attention with the introduction of vision foundation
models. Most existing approaches adopt metric learning or masked image modeling to build …

TokenCompose: Text-to-Image Diffusion with Token-level Supervision

Z Wang, Z Sha, Z Ding, Y Wang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract We present TokenCompose a Latent Diffusion Model for text-to-image generation
that achieves enhanced consistency between user-specified text prompts and model …

Detecting Out-Of-Distribution Earth Observation Images with Diffusion Models

G Le Bellier, N Audebert - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Earth Observation imagery can capture rare and unusual events such as disasters and
major landscape changes whose visual appearance contrasts with the usual observations …

Tokencompose: Grounding diffusion with token-level supervision

Z Wang, Z Sha, Z Ding, Y Wang, Z Tu - arXiv preprint arXiv:2312.03626, 2023 - arxiv.org
We present TokenCompose, a Latent Diffusion Model for text-to-image generation that
achieves enhanced consistency between user-specified text prompts and model-generated …

One-shot in-context part segmentation

Z Dai, T Liu, X Zhang, Y Wei, Y Zhang - ACM Multimedia 2024, 2024 - openreview.net
In this paper, we present the One-shot In-context Part Segmentation (OIParts) framework,
designed to tackle the challenges of part segmentation by leveraging visual foundation …

There is no SAMantics! Exploring SAM as a Backbone for Visual Understanding Tasks

M Espinosa, C Yang, L Ericsson, S McDonagh… - arXiv preprint arXiv …, 2024 - arxiv.org
The Segment Anything Model (SAM) was originally designed for label-agnostic mask
generation. Does this model also possess inherent semantic understanding, of value to …