ControlNet : Improving Conditional Controls with Efficient Consistency Feedback

M Li, T Yang, H Kuang, J Wu, Z Wang, X Xiao… - … on Computer Vision, 2025 - Springer
To enhance the controllability of text-to-image diffusion models, existing efforts like
ControlNet incorporated image-based conditional controls. In this paper, we reveal that …

Dragapart: Learning a part-level motion prior for articulated objects

R Li, C Zheng, C Rupprecht, A Vedaldi - European Conference on …, 2025 - Springer
We introduce DragAPart, a method that, given an image and a set of drags as input,
generates a new image of the same object that responds to the action of the drags …

Diffusion models for monocular depth estimation: Overcoming challenging conditions

F Tosi, PZ Ramirez, M Poggi - European Conference on Computer Vision, 2025 - Springer
We present a novel approach designed to address the complexities posed by challenging,
out-of-distribution data in the single-image depth estimation task. Starting with images that …

Smartcontrol: Enhancing controlnet for handling rough visual conditions

X Liu, Y Wei, M Liu, X Lin, P Ren, X Xie… - European Conference on …, 2025 - Springer
Recent text-to-image generation methods such as ControlNet have achieved remarkable
success in controlling image layouts, where the generated images by the default model are …

Controllable generation with text-to-image diffusion models: A survey

P Cao, F Zhou, Q Song, L Yang - arXiv preprint arXiv:2403.04279, 2024 - arxiv.org
In the rapidly advancing realm of visual generation, diffusion models have revolutionized the
landscape, marking a significant shift in capabilities with their impressive text-guided …

Multi-modal generative ai: Multi-modal llm, diffusion and beyond

H Chen, X Wang, Y Zhou, B Huang, Y Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Multi-modal generative AI has received increasing attention in both academia and industry.
Particularly, two dominant families of techniques are: i) The multi-modal large language …

Anycontrol: create your artwork with versatile control on text-to-image generation

Y Sun, Y Liu, Y Tang, W Pei, K Chen - European Conference on Computer …, 2025 - Springer
The field of text-to-image (T2I) generation has made significant progress in recent years,
largely driven by advancements in diffusion models. Linguistic control enables effective …

When ControlNet Meets Inexplicit Masks: A Case Study of ControlNet on its Contour-following Ability

W Xuan, Y Xu, S Zhao, C Wang, J Liu, B Du… - Proceedings of the 32nd …, 2024 - dl.acm.org
ControlNet excels at creating content that closely matches precise contours in user-provided
masks. However, when these masks contain noise, as a frequent occurrence with non …

Bootpig: Bootstrapping zero-shot personalized image generation capabilities in pretrained diffusion models

S Purushwalkam, A Gokul, S Joty, N Naik - arXiv preprint arXiv …, 2024 - arxiv.org
Recent text-to-image generation models have demonstrated incredible success in
generating images that faithfully follow input prompts. However, the requirement of using …

A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models

X Shuai, H Ding, X Ma, R Tu, YG Jiang… - arXiv preprint arXiv …, 2024 - arxiv.org
Image editing aims to edit the given synthetic or real image to meet the specific requirements
from users. It is widely studied in recent years as a promising and challenging field of …