Evalcrafter: Benchmarking and evaluating large video generation models

Z Xing, Q Feng, H Chen, Q Dai, H Hu, H Xu… - ACM Computing …, 2024 - dl.acm.org

The recent wave of AI-generated content (AIGC) has witnessed substantial success in
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …

被引用次数：76 相关文章所有 3 个版本

[PDF] springer.com

Perceptual video quality assessment: A survey

X Min, H Duan, W Sun, Y Zhu, G Zhai - Science China Information …, 2024 - Springer

Perceptual video quality assessment plays a vital role in the field of video processing due to
the existence of quality degradations introduced in various stages of video signal …

被引用次数：68 相关文章所有 2 个版本

[PDF] arxiv.org

Videopoet: A large language model for zero-shot video generation

D Kondratyuk, L Yu, X Gu, J Lezama, J Huang… - arXiv preprint arXiv …, 2023 - arxiv.org

We present VideoPoet, a language model capable of synthesizing high-quality video, with
matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder …

被引用次数：156 相关文章所有 5 个版本

[PDF] thecvf.com

Vbench: Comprehensive benchmark suite for video generative models

Z Huang, Y He, J Yu, F Zhang, C Si… - Proceedings of the …, 2024 - openaccess.thecvf.com

Video generation has witnessed significant advancements yet evaluating these models
remains a challenge. A comprehensive evaluation benchmark for video generation is …

被引用次数：167 相关文章所有 4 个版本

[PDF] arxiv.org

Evaluating text-to-visual generation with image-to-text generation

Z Lin, D Pathak, B Li, J Li, X Xia, G Neubig… - … on Computer Vision, 2025 - Springer

Despite significant progress in generative AI, comprehensive evaluation remains
challenging because of the lack of effective metrics and standardized benchmarks. For …

被引用次数：52 相关文章所有 2 个版本

[PDF] thecvf.com

Fairy: Fast parallelized instruction-guided video-to-video synthesis

B Wu, CY Chuang, X Wang, Y Jia… - Proceedings of the …, 2024 - openaccess.thecvf.com

In this paper we introduce Fairy a minimalist yet robust adaptation of image-editing diffusion
models enhancing them for video editing applications. Our approach centers on the concept …

被引用次数：21 相关文章所有 3 个版本

[PDF] arxiv.org

Miradata: A large-scale video dataset with long durations and structured captions

X Ju, Y Gao, Z Zhang, Z Yuan, X Wang, A Zeng… - arXiv preprint arXiv …, 2024 - arxiv.org

Sora's high-motion intensity and long consistent videos have significantly impacted the field
of video generation, attracting unprecedented attention. However, existing publicly available …

被引用次数：22 相关文章所有 3 个版本

[PDF] arxiv.org

T2v-compbench: A comprehensive benchmark for compositional text-to-video generation

K Sun, K Huang, X Liu, Y Wu, Z Xu, Z Li… - arXiv preprint arXiv …, 2024 - arxiv.org

Text-to-video (T2V) generation models have advanced significantly, yet their ability to
compose different objects, attributes, actions, and motions into a video remains unexplored …

被引用次数：13 相关文章所有 2 个版本

[PDF] arxiv.org

Videoscore: Building automatic metrics to simulate fine-grained human feedback for video generation

X He, D Jiang, G Zhang, M Ku, A Soni, S Siu… - arXiv preprint arXiv …, 2024 - arxiv.org

The recent years have witnessed great advances in video generation. However, the
development of automatic video metrics is lagging significantly behind. None of the existing …

被引用次数：14 相关文章所有 3 个版本

[PDF] arxiv.org

Chronomagic-bench: A benchmark for metamorphic evaluation of text-to-time-lapse video generation

S Yuan, J Huang, Y Xu, Y Liu, S Zhang, Y Shi… - arXiv preprint arXiv …, 2024 - arxiv.org

We propose a novel text-to-video (T2V) generation benchmark, ChronoMagic-Bench, to
evaluate the temporal and metamorphic capabilities of the T2V models (eg Sora and …

被引用次数：16 相关文章所有 4 个版本