Chronomagic-bench: A benchmark for metamorphic evaluation of text-to-time-lapse video generation

Z Yang, J Teng, W Zheng, M Ding, S Huang… - arXiv preprint arXiv …, 2024 - arxiv.org

We introduce CogVideoX, a large-scale diffusion transformer model designed for generating
videos based on text prompts. To efficently model video data, we propose to levearge a 3D …

被引用次数：13 相关文章所有 2 个版本

[PDF] arxiv.org

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

F Meng, J Liao, X Tan, W Shao, Q Lu, K Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

Text-to-video (T2V) models like Sora have made significant strides in visualizing complex
prompts, which is increasingly viewed as a promising path towards constructing the …

[PDF] arxiv.org

CASA: Class-Agnostic Shared Attributes in Vision-Language Models for Efficient Incremental Object Detection

M Guo, Y Liu, Z Lin, P Peng, Y Tian - arXiv preprint arXiv:2410.05804, 2024 - arxiv.org

Incremental object detection (IOD) is challenged by background shift, where background
categories in sequential data may include previously learned or future classes. Inspired by …

[PDF] arxiv.org

IV-Mixed Sampler: Leveraging Image Diffusion Models for Enhanced Video Synthesis

S Shao, Z Zhou, L Bai, H Xiond, Z Xie - arXiv preprint arXiv:2410.04171, 2024 - arxiv.org

The multi-step sampling mechanism, a key feature of visual diffusion models, has significant
potential to replicate the success of OpenAI's Strawberry in enhancing performance by …

[PDF] arxiv.org

OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model

L Chen, Z Li, B Lin, B Zhu, Q Wang, S Yuan… - arXiv preprint arXiv …, 2024 - arxiv.org

Variational Autoencoder (VAE), compressing videos into latent representations, is a crucial
preceding component of Latent Video Diffusion Models (LVDMs). With the same …

BitQ: Tailoring Block Floating Point Precision for Improved DNN Efficiency on Resource-Constrained Devices

Y Xu, Y Lee, G Yi, B Liu, Y Chen, P Liu, J Wu… - arXiv preprint arXiv …, 2024 - arxiv.org

Deep neural networks (DNNs) are powerful for cognitive tasks such as image classification,
object detection, and scene segmentation. One drawback however is the significant high …