Text2video-zero: Text-to-image diffusion models are zero-shot video generators

L Yang, Z Zhang, Y Song, S Hong, R Xu, Y Zhao… - ACM Computing …, 2023 - dl.acm.org

Diffusion models have emerged as a powerful new family of deep generative models with
record-breaking performance in many applications, including image synthesis, video …

被引用次数：1381 相关文章所有 6 个版本

[PDF] arxiv.org

A survey on video diffusion models

Z Xing, Q Feng, H Chen, Q Dai, H Hu, H Xu… - ACM Computing …, 2024 - dl.acm.org

The recent wave of AI-generated content (AIGC) has witnessed substantial success in
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …

被引用次数：73 相关文章所有 3 个版本

[PDF] arxiv.org

Animatediff: Animate your personalized text-to-image diffusion models without specific tuning

Y Guo, C Yang, A Rao, Z Liang, Y Wang, Y Qiao… - arXiv preprint arXiv …, 2023 - arxiv.org

With the advance of text-to-image (T2I) diffusion models (eg, Stable Diffusion) and
corresponding personalization techniques such as DreamBooth and LoRA, everyone can …

被引用次数：513 相关文章所有 3 个版本

[PDF] thecvf.com

Preserve your own correlation: A noise prior for video diffusion models

S Ge, S Nah, G Liu, T Poon, A Tao… - Proceedings of the …, 2023 - openaccess.thecvf.com

Despite tremendous progress in generating high-quality images using diffusion models,
synthesizing a sequence of animated frames that are both photorealistic and temporally …

被引用次数：208 相关文章所有 6 个版本

Videocomposer: Compositional video synthesis with motion controllability

X Wang, H Yuan, S Zhang, D Chen… - Advances in …, 2024 - proceedings.neurips.cc

The pursuit of controllability as a higher standard of visual content creation has yielded
remarkable progress in customizable image synthesis. However, achieving controllable …

被引用次数：256 相关文章所有 6 个版本

[PDF] arxiv.org

Dynamicrafter: Animating open-domain images with video diffusion priors

J Xing, M Xia, Y Zhang, H Chen, W Yu, H Liu… - … on Computer Vision, 2025 - Springer

Animating a still image offers an engaging visual experience. Traditional image animation
techniques mainly focus on animating natural scenes with stochastic dynamics (eg clouds …

被引用次数：128 相关文章所有 2 个版本

[PDF] arxiv.org

Show-1: Marrying pixel and latent diffusion models for text-to-video generation

DJ Zhang, JZ Wu, JW Liu, R Zhao, L Ran, Y Gu… - International Journal of …, 2024 - Springer

Significant advancements have been achieved in the realm of large-scale pre-trained text-to-
video Diffusion Models (VDMs). However, previous methods either rely solely on pixel …

被引用次数：140 相关文章所有 2 个版本

[PDF] thecvf.com

Align your gaussians: Text-to-4d with dynamic 3d gaussians and composed diffusion models

H Ling, SW Kim, A Torralba… - Proceedings of the …, 2024 - openaccess.thecvf.com

Text-guided diffusion models have revolutionized image and video generation and have
also been successfully used for optimization-based 3D object synthesis. Here we instead …

被引用次数：84 相关文章所有 4 个版本

[PDF] arxiv.org

Rerender a video: Zero-shot text-guided video-to-video translation

S Yang, Y Zhou, Z Liu, CC Loy - SIGGRAPH Asia 2023 Conference …, 2023 - dl.acm.org

Large text-to-image diffusion models have exhibited impressive proficiency in generating
high-quality images. However, when applying these models to video domain, ensuring …

被引用次数：160 相关文章所有 4 个版本

[PDF] thecvf.com

Codef: Content deformation fields for temporally consistent video processing

H Ouyang, Q Wang, Y Xiao, Q Bai… - Proceedings of the …, 2024 - openaccess.thecvf.com

We present the content deformation field (CoDeF) as a new type of video representation
which consists of a canonical content field aggregating the static contents in the entire video …

被引用次数：65 相关文章所有 3 个版本