Uni-controlnet: All-in-one control to text-to-image diffusion models

X Wang, H Yuan, S Zhang, D Chen… - Advances in …, 2024 - proceedings.neurips.cc

The pursuit of controllability as a higher standard of visual content creation has yielded
remarkable progress in customizable image synthesis. However, achieving controllable …

被引用次数：208 相关文章所有 6 个版本

[PDF] nowpublishers.com

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com

Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

被引用次数：170 相关文章所有 6 个版本

[PDF] neurips.cc

Raphael: Text-to-image generation via large mixture of diffusion paths

Z Xue, G Song, Q Guo, B Liu, Z Zong… - Advances in Neural …, 2024 - proceedings.neurips.cc

Text-to-image generation has recently witnessed remarkable achievements. We introduce a
text-conditional image diffusion model, termed RAPHAEL, to generate highly artistic images …

被引用次数：89 相关文章所有 6 个版本

[PDF] arxiv.org

Sparsectrl: Adding sparse controls to text-to-video diffusion models

Y Guo, C Yang, A Rao, M Agrawala, D Lin… - European Conference on …, 2025 - Springer

The development of text-to-video (T2V), ie, generating videos with a given text prompt, has
been significantly advanced in recent years. However, relying solely on text prompts often …

被引用次数：41 相关文章所有 2 个版本

Motiondirector: Motion customization of text-to-video diffusion models

R Zhao, Y Gu, JZ Wu, DJ Zhang, JW Liu, W Wu… - … on Computer Vision, 2025 - Springer

Large-scale pre-trained diffusion models have exhibited remarkable capabilities in diverse
video generations. Given a set of video clips of the same motion concept, the task of Motion …

被引用次数：50 相关文章所有 3 个版本

[PDF] neurips.cc

Momentdiff: Generative video moment retrieval from random to real

P Li, CW Xie, H Xie, L Zhao, L Zhang… - Advances in neural …, 2024 - proceedings.neurips.cc

Video moment retrieval pursues an efficient and generalized solution to identify the specific
temporal segments within an untrimmed video that correspond to a given language …

被引用次数：43 相关文章所有 6 个版本

Cones 2: Customizable image synthesis with multiple subjects

Z Liu, Y Zhang, Y Shen, K Zheng, K Zhu… - Proceedings of the 37th …, 2023 - dl.acm.org

Synthesizing images with user-specified subjects has received growing attention due to its
practical applications. Despite the recent success in single subject customization, existing …

被引用次数：60 相关文章所有 3 个版本

[PDF] thecvf.com

Freecontrol: Training-free spatial control of any text-to-image diffusion model with any condition

S Mo, F Mu, KH Lin, Y Liu, B Guan… - Proceedings of the …, 2024 - openaccess.thecvf.com

Recent approaches such as ControlNet offer users fine-grained spatial control over text-to-
image (T2I) diffusion models. However auxiliary modules have to be trained for each spatial …

被引用次数：16 相关文章所有 4 个版本

[PDF] arxiv.org

Diffusion model-based image editing: A survey

Y Huang, J Huang, Y Liu, M Yan, J Lv, J Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

Denoising diffusion models have emerged as a powerful tool for various image generation
and editing tasks, facilitating the synthesis of visual content in an unconditional or input …

被引用次数：41 相关文章所有 2 个版本

[PDF] thecvf.com

Ssr-encoder: Encoding selective subject representation for subject-driven generation

Y Zhang, Y Song, J Liu, R Wang, J Yu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Recent advancements in subject-driven image generation have led to zero-shot generation
yet precise selection and focus on crucial subject representations remain challenging …

被引用次数：22 相关文章所有 3 个版本