Layoutgpt: Compositional visual planning and generation with large language models
Attaining a high degree of user controllability in visual generation often requires intricate,
fine-grained inputs like layouts. However, such inputs impose a substantial burden on users …
fine-grained inputs like layouts. However, such inputs impose a substantial burden on users …
Boxdiff: Text-to-image synthesis with training-free box-constrained diffusion
Recent text-to-image diffusion models have demonstrated an astonishing capacity to
generate high-quality images. However, researchers mainly studied the way of synthesizing …
generate high-quality images. However, researchers mainly studied the way of synthesizing …
Tokenflow: Consistent diffusion features for consistent video editing
The generative AI revolution has recently expanded to videos. Nevertheless, current state-of-
the-art video models are still lagging behind image models in terms of visual quality and …
the-art video models are still lagging behind image models in terms of visual quality and …
Expressive text-to-image generation with rich text
Plain text has become a prevalent interface for text-to-image synthesis. However, its limited
customization options hinder users from accurately describing desired outputs. For example …
customization options hinder users from accurately describing desired outputs. For example …
Grounded text-to-image synthesis with attention refocusing
Driven by the scalable diffusion models trained on large-scale datasets text-to-image
synthesis methods have shown compelling results. However these models still fail to …
synthesis methods have shown compelling results. However these models still fail to …
Mix-of-show: Decentralized low-rank adaptation for multi-concept customization of diffusion models
Public large-scale text-to-image diffusion models, such as Stable Diffusion, have gained
significant attention from the community. These models can be easily customized for new …
significant attention from the community. These models can be easily customized for new …
Space-time diffusion features for zero-shot text-driven motion transfer
We present a new method for text-driven motion transfer-synthesizing a video that complies
with an input text prompt describing the target objects and scene while maintaining an input …
with an input text prompt describing the target objects and scene while maintaining an input …
Zero-shot spatial layout conditioning for text-to-image diffusion models
Large-scale text-to-image diffusion models have significantly improved the state of the art in
generative image modeling and allow for an intuitive and powerful user interface to drive the …
generative image modeling and allow for an intuitive and powerful user interface to drive the …
Compositional text-to-image synthesis with attention map control of diffusion models
Recent text-to-image (T2I) diffusion models show outstanding performance in generating
high-quality images conditioned on textual prompts. However, they fail to semantically align …
high-quality images conditioned on textual prompts. However, they fail to semantically align …
Unveiling and mitigating memorization in text-to-image diffusion models through cross attention
Recent advancements in text-to-image (T2I) diffusion models have demonstrated their
remarkable capability to generate high-quality images from textual prompts. However …
remarkable capability to generate high-quality images from textual prompts. However …