Videocomposer: Compositional video synthesis with motion controllability
The pursuit of controllability as a higher standard of visual content creation has yielded
remarkable progress in customizable image synthesis. However, achieving controllable …
remarkable progress in customizable image synthesis. However, achieving controllable …
Multimodal foundation models: From specialists to general-purpose assistants
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …
methods to data compression. Recent advances in statistical machine learning have opened …
Raphael: Text-to-image generation via large mixture of diffusion paths
Text-to-image generation has recently witnessed remarkable achievements. We introduce a
text-conditional image diffusion model, termed RAPHAEL, to generate highly artistic images …
text-conditional image diffusion model, termed RAPHAEL, to generate highly artistic images …
Sparsectrl: Adding sparse controls to text-to-video diffusion models
The development of text-to-video (T2V), ie, generating videos with a given text prompt, has
been significantly advanced in recent years. However, relying solely on text prompts often …
been significantly advanced in recent years. However, relying solely on text prompts often …
Motiondirector: Motion customization of text-to-video diffusion models
Large-scale pre-trained diffusion models have exhibited remarkable capabilities in diverse
video generations. Given a set of video clips of the same motion concept, the task of Motion …
video generations. Given a set of video clips of the same motion concept, the task of Motion …
Momentdiff: Generative video moment retrieval from random to real
Video moment retrieval pursues an efficient and generalized solution to identify the specific
temporal segments within an untrimmed video that correspond to a given language …
temporal segments within an untrimmed video that correspond to a given language …
Cones 2: Customizable image synthesis with multiple subjects
Synthesizing images with user-specified subjects has received growing attention due to its
practical applications. Despite the recent success in single subject customization, existing …
practical applications. Despite the recent success in single subject customization, existing …
Freecontrol: Training-free spatial control of any text-to-image diffusion model with any condition
Recent approaches such as ControlNet offer users fine-grained spatial control over text-to-
image (T2I) diffusion models. However auxiliary modules have to be trained for each spatial …
image (T2I) diffusion models. However auxiliary modules have to be trained for each spatial …
Diffusion model-based image editing: A survey
Denoising diffusion models have emerged as a powerful tool for various image generation
and editing tasks, facilitating the synthesis of visual content in an unconditional or input …
and editing tasks, facilitating the synthesis of visual content in an unconditional or input …
Ssr-encoder: Encoding selective subject representation for subject-driven generation
Recent advancements in subject-driven image generation have led to zero-shot generation
yet precise selection and focus on crucial subject representations remain challenging …
yet precise selection and focus on crucial subject representations remain challenging …