Non-uniform timestep sampling: Towards faster diffusion model training
Diffusion models have garnered significant success in generative tasks, emerging as the
predominant model in this domain. Despite their success, the substantial computational …
predominant model in this domain. Despite their success, the substantial computational …
Multi-modal generative ai: Multi-modal llm, diffusion and beyond
Multi-modal generative AI has received increasing attention in both academia and industry.
Particularly, two dominant families of techniques are: i) The multi-modal large language …
Particularly, two dominant families of techniques are: i) The multi-modal large language …
Identity-Preserving Text-to-Video Generation by Frequency Decomposition
Identity-preserving text-to-video (IPT2V) generation aims to create high-fidelity videos with
consistent human identity. It is an important task in video generation but remains an open …
consistent human identity. It is an important task in video generation but remains an open …
CamI2V: Camera-Controlled Image-to-Video Diffusion Model
Recently, camera pose, as a user-friendly and physics-related condition, has been
introduced into text-to-video diffusion model for camera control. However, existing methods …
introduced into text-to-video diffusion model for camera control. However, existing methods …
Motion Prompting: Controlling Video Generation with Motion Trajectories
Motion control is crucial for generating expressive and compelling video content; however,
most existing video generation models rely mainly on text prompts for control, which struggle …
most existing video generation models rely mainly on text prompts for control, which struggle …
Trajectory Attention for Fine-grained Video Motion Control
Recent advancements in video generation have been greatly driven by video diffusion
models, with camera motion control emerging as a crucial challenge in creating view …
models, with camera motion control emerging as a crucial challenge in creating view …
MIMAFace: Face Animation via Motion-Identity Modulated Appearance Feature Learning
Current diffusion-based face animation methods generally adopt a ReferenceNet (a copy of
U-Net) and a large amount of curated self-acquired data to learn appearance features, as …
U-Net) and a large amount of curated self-acquired data to learn appearance features, as …
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Story visualization, the task of creating visual narratives from textual descriptions, has seen
progress with text-to-image generation models. However, these models often lack effective …
progress with text-to-image generation models. However, these models often lack effective …
SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device
We have witnessed the unprecedented success of diffusion-based video generation over
the past year. Recently proposed models from the community have wielded the power to …
the past year. Recently proposed models from the community have wielded the power to …
RelationBooth: Towards Relation-Aware Customized Object Generation
Customized image generation is crucial for delivering personalized content based on user-
provided image prompts, aligning large-scale text-to-image diffusion models with individual …
provided image prompts, aligning large-scale text-to-image diffusion models with individual …