Videocomposer: Compositional video synthesis with motion controllability

X Wang, H Yuan, S Zhang, D Chen… - Advances in …, 2024 - proceedings.neurips.cc
The pursuit of controllability as a higher standard of visual content creation has yielded
remarkable progress in customizable image synthesis. However, achieving controllable …

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

Raphael: Text-to-image generation via large mixture of diffusion paths

Z Xue, G Song, Q Guo, B Liu, Z Zong… - Advances in Neural …, 2024 - proceedings.neurips.cc
Text-to-image generation has recently witnessed remarkable achievements. We introduce a
text-conditional image diffusion model, termed RAPHAEL, to generate highly artistic images …

Sparsectrl: Adding sparse controls to text-to-video diffusion models

Y Guo, C Yang, A Rao, M Agrawala, D Lin… - European Conference on …, 2025 - Springer
The development of text-to-video (T2V), ie, generating videos with a given text prompt, has
been significantly advanced in recent years. However, relying solely on text prompts often …

Motiondirector: Motion customization of text-to-video diffusion models

R Zhao, Y Gu, JZ Wu, DJ Zhang, JW Liu, W Wu… - … on Computer Vision, 2025 - Springer
Large-scale pre-trained diffusion models have exhibited remarkable capabilities in diverse
video generations. Given a set of video clips of the same motion concept, the task of Motion …

Momentdiff: Generative video moment retrieval from random to real

P Li, CW Xie, H Xie, L Zhao, L Zhang… - Advances in neural …, 2024 - proceedings.neurips.cc
Video moment retrieval pursues an efficient and generalized solution to identify the specific
temporal segments within an untrimmed video that correspond to a given language …

Cones 2: Customizable image synthesis with multiple subjects

Z Liu, Y Zhang, Y Shen, K Zheng, K Zhu… - Proceedings of the 37th …, 2023 - dl.acm.org
Synthesizing images with user-specified subjects has received growing attention due to its
practical applications. Despite the recent success in single subject customization, existing …

Freecontrol: Training-free spatial control of any text-to-image diffusion model with any condition

S Mo, F Mu, KH Lin, Y Liu, B Guan… - Proceedings of the …, 2024 - openaccess.thecvf.com
Recent approaches such as ControlNet offer users fine-grained spatial control over text-to-
image (T2I) diffusion models. However auxiliary modules have to be trained for each spatial …

Diffusion model-based image editing: A survey

Y Huang, J Huang, Y Liu, M Yan, J Lv, J Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
Denoising diffusion models have emerged as a powerful tool for various image generation
and editing tasks, facilitating the synthesis of visual content in an unconditional or input …

Ssr-encoder: Encoding selective subject representation for subject-driven generation

Y Zhang, Y Song, J Liu, R Wang, J Yu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Recent advancements in subject-driven image generation have led to zero-shot generation
yet precise selection and focus on crucial subject representations remain challenging …