Draganything: Motion control for anything using entity representation

W Wu, Z Li, Y Gu, R Zhao, Y He, DJ Zhang… - … on Computer Vision, 2025 - Springer
We introduce DragAnything, which utilizes a entity representation to achieve motion control
for any object in controllable video generation. Comparison to existing motion control …

Readout guidance: Learning control from diffusion features

G Luo, T Darrell, O Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract We present Readout Guidance a method for controlling text-to-image diffusion
models with learned signals. Readout Guidance uses readout heads lightweight networks …

Magicdrive: Street view generation with diverse 3d geometry control

R Gao, K Chen, E Xie, L Hong, Z Li, DY Yeung… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent advancements in diffusion models have significantly enhanced the data synthesis
with 2D control. Yet, precise 3D control in street view generation, crucial for 3D perception …

Focus on your instruction: Fine-grained and multi-instruction image editing by attention modulation

Q Guo, T Lin - Proceedings of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Recently diffusion-based methods like InstructPix2Pix (IP2P) have achieved effective
instruction-based image editing requiring only natural language instructions from the user …

Dginstyle: Domain-generalizable semantic segmentation with image diffusion models and stylized semantic control

Y Jia, L Hoyer, S Huang, T Wang, L Van Gool… - … on Computer Vision, 2025 - Springer
Large, pretrained latent diffusion models (LDMs) have demonstrated an extraordinary ability
to generate creative content, specialize to user data through few-shot fine-tuning, and …

Hoidiffusion: Generating realistic 3d hand-object interaction data

M Zhang, Y Fu, Z Ding, S Liu, Z Tu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract 3D hand-object interaction data is scarce due to the hardware constraints in scaling
up the data collection process. In this paper we propose HOIDiffusion for generating realistic …

Generative models: What do they know? do they know things? let's find out!

X Du, N Kolkin, G Shakhnarovich, A Bhattad - arXiv preprint arXiv …, 2023 - arxiv.org
Generative models excel at mimicking real scenes, suggesting they might inherently encode
important intrinsic scene properties. In this paper, we aim to explore the following key …

Unigs: Unified representation for image generation and segmentation

L Qi, L Yang, W Guo, Y Xu, B Du… - Proceedings of the …, 2024 - openaccess.thecvf.com
This paper introduces a novel unified representation of diffusion models for image
generation and segmentation. Specifically we use a colormap to represent entity-level …

Deepfake: definitions, performance metrics and standards, datasets, and a meta-review

E Altuncu, VNL Franqueira, S Li - Frontiers in Big Data, 2024 - frontiersin.org
Recent advancements in AI, especially deep learning, have contributed to a significant
increase in the creation of new realistic-looking synthetic media (video, image, and audio) …

Multimodal self-instruct: Synthetic abstract image and visual reasoning instruction using language model

W Zhang, Z Cheng, Y He, M Wang, Y Shen… - arXiv preprint arXiv …, 2024 - arxiv.org
Although most current large multimodal models (LMMs) can already understand photos of
natural scenes and portraits, their understanding of abstract images, eg, charts, maps, or …