Videocrafter2: Overcoming data limitations for high-quality video diffusion models

Y Liu, K Zhang, Y Li, Z Yan, C Gao, R Chen… - arXiv preprint arXiv …, 2024 - arxiv.org

Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The
model is trained to generate videos of realistic or imaginative scenes from text instructions …

被引用次数：134 相关文章所有 2 个版本

[PDF] thecvf.com

Vbench: Comprehensive benchmark suite for video generative models

Z Huang, Y He, J Yu, F Zhang, C Si… - Proceedings of the …, 2024 - openaccess.thecvf.com

Video generation has witnessed significant advancements yet evaluating these models
remains a challenge. A comprehensive evaluation benchmark for video generation is …

被引用次数：93 相关文章所有 4 个版本

[PDF] ieee.org

When does Sora show: The beginning of TAO to imaginative intelligence and scenarios engineering

FY Wang, Q Miao, L Li, Q Ni, X Li, J Li… - IEEE/CAA Journal of …, 2024 - ieeexplore.ieee.org

During our discussion at workshops for writing “What Does ChatGPT Say: The DAO from
Algorithmic Intelligence to Linguistic Intelligence”[1], we had expected the next milestone for …

被引用次数：45 相关文章所有 4 个版本

[PDF] arxiv.org

Tc4d: Trajectory-conditioned text-to-4d generation

S Bahmani, X Liu, W Yifan, I Skorokhodov… - … on Computer Vision, 2025 - Springer

Recent techniques for text-to-4D generation synthesize dynamic 3D scenes using
supervision from pre-trained text-to-video models. However, existing representations, such …

被引用次数：14 相关文章所有 3 个版本

[PDF] openreview.net

Leo: Generative latent image animator for human video synthesis

Y Wang, X Ma, X Chen, C Chen, A Dantcheva… - International Journal of …, 2024 - Springer

Spatio-temporal coherency is a major challenge in synthesizing high quality videos,
particularly in synthesizing human videos that contain rich global and local deformations. To …

被引用次数：21 相关文章所有 3 个版本

[PDF] arxiv.org

Advances in 3d generation: A survey

X Li, Q Zhang, D Kang, W Cheng, Y Gao… - arXiv preprint arXiv …, 2024 - arxiv.org

Generating 3D models lies at the core of computer graphics and has been the focus of
decades of research. With the emergence of advanced neural representations and …

被引用次数：17 相关文章所有 2 个版本

[PDF] arxiv.org

Anyv2v: A plug-and-play framework for any video-to-video editing tasks

M Ku, C Wei, W Ren, H Yang, W Chen - arXiv preprint arXiv:2403.14468, 2024 - arxiv.org

Video-to-video editing involves editing a source video along with additional control (such as
text prompts, subjects, or styles) to generate a new video that aligns with the source video …

被引用次数：9 相关文章所有 2 个版本

[PDF] arxiv.org

Mimicmotion: High-quality human motion video generation with confidence-aware pose guidance

Y Zhang, J Gu, LW Wang, H Wang, J Cheng… - arXiv preprint arXiv …, 2024 - arxiv.org

In recent years, generative artificial intelligence has achieved significant advancements in
the field of image generation, spawning a variety of applications. However, video generation …

被引用次数：9 相关文章所有 2 个版本

[PDF] arxiv.org

Miradata: A large-scale video dataset with long durations and structured captions

X Ju, Y Gao, Z Zhang, Z Yuan, X Wang, A Zeng… - arXiv preprint arXiv …, 2024 - arxiv.org

Sora's high-motion intensity and long consistent videos have significantly impacted the field
of video generation, attracting unprecedented attention. However, existing publicly available …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

Motionbooth: Motion-aware customized text-to-video generation

J Wu, X Li, Y Zeng, J Zhang, Q Zhou, Y Li… - arXiv preprint arXiv …, 2024 - arxiv.org

In this work, we present MotionBooth, an innovative framework designed for animating
customized subjects with precise control over both object and camera movements. By …

被引用次数：4 相关文章所有 3 个版本