Imagen video: High definition video generation with diffusion models- 学术资源搜索

Imagen video: High definition video generation with diffusion models

J Ho, W Chan, C Saharia, J Whang, R Gao… - arXiv preprint arXiv …, 2022 - arxiv.org

J Ho, W Chan, C Saharia, J Whang, R Gao, A Gritsenko, DP Kingma, B Poole, M Norouzi…

arXiv preprint arXiv:2210.02303, 2022•arxiv.org

We present Imagen Video, a text-conditional video generation system based on a cascade of video diffusion models. Given a text prompt, Imagen Video generates high definition videos using a base video generation model and a sequence of interleaved spatial and temporal video super-resolution models. We describe how we scale up the system as a high definition text-to-video model including design decisions such as the choice of fully-convolutional temporal and spatial super-resolution models at certain resolutions, and the choice of the v-parameterization of diffusion models. In addition, we confirm and transfer findings from previous work on diffusion-based image generation to the video generation setting. Finally, we apply progressive distillation to our video models with classifier-free guidance for fast, high quality sampling. We find Imagen Video not only capable of generating videos of high fidelity, but also having a high degree of controllability and world knowledge, including the ability to generate diverse videos and text animations in various artistic styles and with 3D object understanding. See https://imagen.research.google/video/ for samples.

arxiv.org

展开收起

被引用次数：1310 相关文章所有 4 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果