Non-uniform timestep sampling: Towards faster diffusion model training

T Zheng, C Geng, PT Jiang, B Wan, H Zhang… - Proceedings of the …, 2024 - dl.acm.org
Diffusion models have garnered significant success in generative tasks, emerging as the
predominant model in this domain. Despite their success, the substantial computational …

Multi-modal generative ai: Multi-modal llm, diffusion and beyond

H Chen, X Wang, Y Zhou, B Huang, Y Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Multi-modal generative AI has received increasing attention in both academia and industry.
Particularly, two dominant families of techniques are: i) The multi-modal large language …

Identity-Preserving Text-to-Video Generation by Frequency Decomposition

S Yuan, J Huang, X He, Y Ge, Y Shi, L Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
Identity-preserving text-to-video (IPT2V) generation aims to create high-fidelity videos with
consistent human identity. It is an important task in video generation but remains an open …

CamI2V: Camera-Controlled Image-to-Video Diffusion Model

G Zheng, T Li, R Jiang, Y Lu, T Wu, X Li - arXiv preprint arXiv:2410.15957, 2024 - arxiv.org
Recently, camera pose, as a user-friendly and physics-related condition, has been
introduced into text-to-video diffusion model for camera control. However, existing methods …

Motion Prompting: Controlling Video Generation with Motion Trajectories

D Geng, C Herrmann, J Hur, F Cole, S Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Motion control is crucial for generating expressive and compelling video content; however,
most existing video generation models rely mainly on text prompts for control, which struggle …

Trajectory Attention for Fine-grained Video Motion Control

Z Xiao, W Ouyang, Y Zhou, S Yang, L Yang, J Si… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advancements in video generation have been greatly driven by video diffusion
models, with camera motion control emerging as a crucial challenge in creating view …

MIMAFace: Face Animation via Motion-Identity Modulated Appearance Feature Learning

Y Han, J Zhu, Y Feng, X Ji, K He, X Li, Y Liu - arXiv preprint arXiv …, 2024 - arxiv.org
Current diffusion-based face animation methods generally adopt a ReferenceNet (a copy of
U-Net) and a large amount of curated self-acquired data to learn appearance features, as …

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

J Wu, C Tang, J Wang, Y Zeng, X Li, Y Tong - arXiv preprint arXiv …, 2024 - arxiv.org
Story visualization, the task of creating visual narratives from textual descriptions, has seen
progress with text-to-image generation models. However, these models often lack effective …

SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device

Y Wu, Z Zhang, Y Li, Y Xu, A Kag, Y Sui… - arXiv preprint arXiv …, 2024 - arxiv.org
We have witnessed the unprecedented success of diffusion-based video generation over
the past year. Recently proposed models from the community have wielded the power to …

RelationBooth: Towards Relation-Aware Customized Object Generation

Q Shi, L Qi, J Wu, J Bai, J Wang, Y Tong, X Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Customized image generation is crucial for delivering personalized content based on user-
provided image prompts, aligning large-scale text-to-image diffusion models with individual …