Video (language) modeling: a baseline for generative models of natural videos

Y Yu, X Si, C Hu, J Zhang - Neural computation, 2019 - direct.mit.edu

Recurrent neural networks (RNNs) have been widely adopted in research areas concerned
with sequential data, such as text, audio, and video. However, RNNs consisting of sigma …

被引用次数：3530 相关文章所有 6 个版本

[PDF] arxiv.org

Image and video compression with neural networks: A review

S Ma, X Zhang, C Jia, Z Zhao, S Wang… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org

In recent years, the image and video coding technologies have advanced by leaps and
bounds. However, due to the popularization of image and video acquisition devices, the …

被引用次数：388 相关文章所有 5 个版本

[PDF] arxiv.org

Imagen video: High definition video generation with diffusion models

J Ho, W Chan, C Saharia, J Whang, R Gao… - arXiv preprint arXiv …, 2022 - arxiv.org

We present Imagen Video, a text-conditional video generation system based on a cascade
of video diffusion models. Given a text prompt, Imagen Video generates high definition …

被引用次数：928 相关文章所有 4 个版本

[PDF] thecvf.com

Preserve your own correlation: A noise prior for video diffusion models

S Ge, S Nah, G Liu, T Poon, A Tao… - Proceedings of the …, 2023 - openaccess.thecvf.com

Despite tremendous progress in generating high-quality images using diffusion models,
synthesizing a sequence of animated frames that are both photorealistic and temporally …

被引用次数：133 相关文章所有 6 个版本

[PDF] openreview.net

Phenaki: Variable length video generation from open domain textual descriptions

R Villegas, M Babaeizadeh, PJ Kindermans… - International …, 2022 - openreview.net

We present Phenaki, a model capable of realistic video synthesis given a sequence of
textual prompts. Generating videos from text is particularly challenging due to the …

被引用次数：260 相关文章所有 5 个版本

[PDF] thecvf.com

Sequential modeling enables scalable learning for large vision models

Y Bai, X Geng, K Mangalam, A Bar… - Proceedings of the …, 2024 - openaccess.thecvf.com

We introduce a novel sequential modeling approach which enables learning a Large Vision
Model (LVM) without making use of any linguistic data. To do this we define a common …

被引用次数：56 相关文章所有 3 个版本

[PDF] thecvf.com

Simvp: Simpler yet better video prediction

Z Gao, C Tan, L Wu, SZ Li - … of the IEEE/CVF conference on …, 2022 - openaccess.thecvf.com

Abstract From CNN, RNN, to ViT, we have witnessed remarkable advancements in video
prediction, incorporating auxiliary inputs, elaborate neural architectures, and sophisticated …

被引用次数：164 相关文章所有 6 个版本

[PDF] thecvf.com

Stylegan-v: A continuous video generator with the price, image quality and perks of stylegan2

I Skorokhodov, S Tulyakov… - Proceedings of the …, 2022 - openaccess.thecvf.com

Videos show continuous events, yet most--if not all--video synthesis frameworks treat them
discretely in time. In this work, we think of videos of what they should be--time-continuous …

被引用次数：223 相关文章所有 9 个版本

[PDF] arxiv.org

Long video generation with time-agnostic vqgan and time-sensitive transformer

S Ge, T Hayes, H Yang, X Yin, G Pang… - … on Computer Vision, 2022 - Springer

Videos are created to express emotion, exchange information, and share experiences.
Video synthesis has intrigued researchers for a long time. Despite the rapid progress driven …

被引用次数：150 相关文章所有 5 个版本

[PDF] arxiv.org

Predrnn: A recurrent neural network for spatiotemporal predictive learning

Y Wang, H Wu, J Zhang, Z Gao, J Wang… - … on Pattern Analysis …, 2022 - ieeexplore.ieee.org

The predictive learning of spatiotemporal sequences aims to generate future images by
learning from the historical context, where the visual dynamics are believed to have modular …

被引用次数：310 相关文章所有 6 个版本