A review of recurrent neural networks: LSTM cells and network architectures

Y Yu, X Si, C Hu, J Zhang - Neural computation, 2019 - direct.mit.edu
Recurrent neural networks (RNNs) have been widely adopted in research areas concerned
with sequential data, such as text, audio, and video. However, RNNs consisting of sigma …

Image and video compression with neural networks: A review

S Ma, X Zhang, C Jia, Z Zhao, S Wang… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org
In recent years, the image and video coding technologies have advanced by leaps and
bounds. However, due to the popularization of image and video acquisition devices, the …

Imagen video: High definition video generation with diffusion models

J Ho, W Chan, C Saharia, J Whang, R Gao… - arXiv preprint arXiv …, 2022 - arxiv.org
We present Imagen Video, a text-conditional video generation system based on a cascade
of video diffusion models. Given a text prompt, Imagen Video generates high definition …

Preserve your own correlation: A noise prior for video diffusion models

S Ge, S Nah, G Liu, T Poon, A Tao… - Proceedings of the …, 2023 - openaccess.thecvf.com
Despite tremendous progress in generating high-quality images using diffusion models,
synthesizing a sequence of animated frames that are both photorealistic and temporally …

Phenaki: Variable length video generation from open domain textual descriptions

R Villegas, M Babaeizadeh, PJ Kindermans… - International …, 2022 - openreview.net
We present Phenaki, a model capable of realistic video synthesis given a sequence of
textual prompts. Generating videos from text is particularly challenging due to the …

Sequential modeling enables scalable learning for large vision models

Y Bai, X Geng, K Mangalam, A Bar… - Proceedings of the …, 2024 - openaccess.thecvf.com
We introduce a novel sequential modeling approach which enables learning a Large Vision
Model (LVM) without making use of any linguistic data. To do this we define a common …

Simvp: Simpler yet better video prediction

Z Gao, C Tan, L Wu, SZ Li - … of the IEEE/CVF conference on …, 2022 - openaccess.thecvf.com
Abstract From CNN, RNN, to ViT, we have witnessed remarkable advancements in video
prediction, incorporating auxiliary inputs, elaborate neural architectures, and sophisticated …

Stylegan-v: A continuous video generator with the price, image quality and perks of stylegan2

I Skorokhodov, S Tulyakov… - Proceedings of the …, 2022 - openaccess.thecvf.com
Videos show continuous events, yet most--if not all--video synthesis frameworks treat them
discretely in time. In this work, we think of videos of what they should be--time-continuous …

Long video generation with time-agnostic vqgan and time-sensitive transformer

S Ge, T Hayes, H Yang, X Yin, G Pang… - … on Computer Vision, 2022 - Springer
Videos are created to express emotion, exchange information, and share experiences.
Video synthesis has intrigued researchers for a long time. Despite the rapid progress driven …

Predrnn: A recurrent neural network for spatiotemporal predictive learning

Y Wang, H Wu, J Zhang, Z Gao, J Wang… - … on Pattern Analysis …, 2022 - ieeexplore.ieee.org
The predictive learning of spatiotemporal sequences aims to generate future images by
learning from the historical context, where the visual dynamics are believed to have modular …