A survey on video diffusion models

Z Xing, Q Feng, H Chen, Q Dai, H Hu, H Xu… - ACM Computing …, 2024 - dl.acm.org
The recent wave of AI-generated content (AIGC) has witnessed substantial success in
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …

Perceptual video quality assessment: A survey

X Min, H Duan, W Sun, Y Zhu, G Zhai - Science China Information …, 2024 - Springer
Perceptual video quality assessment plays a vital role in the field of video processing due to
the existence of quality degradations introduced in various stages of video signal …

Videopoet: A large language model for zero-shot video generation

D Kondratyuk, L Yu, X Gu, J Lezama, J Huang… - arXiv preprint arXiv …, 2023 - arxiv.org
We present VideoPoet, a language model capable of synthesizing high-quality video, with
matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder …

Vbench: Comprehensive benchmark suite for video generative models

Z Huang, Y He, J Yu, F Zhang, C Si… - Proceedings of the …, 2024 - openaccess.thecvf.com
Video generation has witnessed significant advancements yet evaluating these models
remains a challenge. A comprehensive evaluation benchmark for video generation is …

Evaluating text-to-visual generation with image-to-text generation

Z Lin, D Pathak, B Li, J Li, X Xia, G Neubig… - … on Computer Vision, 2025 - Springer
Despite significant progress in generative AI, comprehensive evaluation remains
challenging because of the lack of effective metrics and standardized benchmarks. For …

Fairy: Fast parallelized instruction-guided video-to-video synthesis

B Wu, CY Chuang, X Wang, Y Jia… - Proceedings of the …, 2024 - openaccess.thecvf.com
In this paper we introduce Fairy a minimalist yet robust adaptation of image-editing diffusion
models enhancing them for video editing applications. Our approach centers on the concept …

Miradata: A large-scale video dataset with long durations and structured captions

X Ju, Y Gao, Z Zhang, Z Yuan, X Wang, A Zeng… - arXiv preprint arXiv …, 2024 - arxiv.org
Sora's high-motion intensity and long consistent videos have significantly impacted the field
of video generation, attracting unprecedented attention. However, existing publicly available …

T2v-compbench: A comprehensive benchmark for compositional text-to-video generation

K Sun, K Huang, X Liu, Y Wu, Z Xu, Z Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Text-to-video (T2V) generation models have advanced significantly, yet their ability to
compose different objects, attributes, actions, and motions into a video remains unexplored …

Videoscore: Building automatic metrics to simulate fine-grained human feedback for video generation

X He, D Jiang, G Zhang, M Ku, A Soni, S Siu… - arXiv preprint arXiv …, 2024 - arxiv.org
The recent years have witnessed great advances in video generation. However, the
development of automatic video metrics is lagging significantly behind. None of the existing …

Chronomagic-bench: A benchmark for metamorphic evaluation of text-to-time-lapse video generation

S Yuan, J Huang, Y Xu, Y Liu, S Zhang, Y Shi… - arXiv preprint arXiv …, 2024 - arxiv.org
We propose a novel text-to-video (T2V) generation benchmark, ChronoMagic-Bench, to
evaluate the temporal and metamorphic capabilities of the T2V models (eg Sora and …