A survey on video diffusion models

Z Xing, Q Feng, H Chen, Q Dai, H Hu, H Xu… - ACM Computing …, 2024 - dl.acm.org
The recent wave of AI-generated content (AIGC) has witnessed substantial success in
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …

Sora: A review on background, technology, limitations, and opportunities of large vision models

Y Liu, K Zhang, Y Li, Z Yan, C Gao, R Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The
model is trained to generate videos of realistic or imaginative scenes from text instructions …

Animatediff: Animate your personalized text-to-image diffusion models without specific tuning

Y Guo, C Yang, A Rao, Z Liang, Y Wang, Y Qiao… - arXiv preprint arXiv …, 2023 - arxiv.org
With the advance of text-to-image (T2I) diffusion models (eg, Stable Diffusion) and
corresponding personalization techniques such as DreamBooth and LoRA, everyone can …

Preserve your own correlation: A noise prior for video diffusion models

S Ge, S Nah, G Liu, T Poon, A Tao… - Proceedings of the …, 2023 - openaccess.thecvf.com
Despite tremendous progress in generating high-quality images using diffusion models,
synthesizing a sequence of animated frames that are both photorealistic and temporally …

Next-gpt: Any-to-any multimodal llm

S Wu, H Fei, L Qu, W Ji, TS Chua - arXiv preprint arXiv:2309.05519, 2023 - arxiv.org
While recently Multimodal Large Language Models (MM-LLMs) have made exciting strides,
they mostly fall prey to the limitation of only input-side multimodal understanding, without the …

Video-p2p: Video editing with cross-attention control

S Liu, Y Zhang, W Li, Z Lin, J Jia - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Video-P2P is the first framework for real-world video editing with cross-attention control.
While attention control has proven effective for image editing with pre-trained image …

Videopoet: A large language model for zero-shot video generation

D Kondratyuk, L Yu, X Gu, J Lezama, J Huang… - arXiv preprint arXiv …, 2023 - arxiv.org
We present VideoPoet, a language model capable of synthesizing high-quality video, with
matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder …

Rerender a video: Zero-shot text-guided video-to-video translation

S Yang, Y Zhou, Z Liu, CC Loy - SIGGRAPH Asia 2023 Conference …, 2023 - dl.acm.org
Large text-to-image diffusion models have exhibited impressive proficiency in generating
high-quality images. However, when applying these models to video domain, ensuring …

Tokenflow: Consistent diffusion features for consistent video editing

M Geyer, O Bar-Tal, S Bagon, T Dekel - arXiv preprint arXiv:2307.10373, 2023 - arxiv.org
The generative AI revolution has recently expanded to videos. Nevertheless, current state-of-
the-art video models are still lagging behind image models in terms of visual quality and …

Vbench: Comprehensive benchmark suite for video generative models

Z Huang, Y He, J Yu, F Zhang, C Si… - Proceedings of the …, 2024 - openaccess.thecvf.com
Video generation has witnessed significant advancements yet evaluating these models
remains a challenge. A comprehensive evaluation benchmark for video generation is …