A survey on video diffusion models

Z Xing, Q Feng, H Chen, Q Dai, H Hu, H Xu… - ACM Computing …, 2023 - dl.acm.org
The recent wave of AI-generated content (AIGC) has witnessed substantial success in
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …

Sora: A review on background, technology, limitations, and opportunities of large vision models

Y Liu, K Zhang, Y Li, Z Yan, C Gao, R Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The
model is trained to generate videos of realistic or imaginative scenes from text instructions …

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

Panda-70m: Captioning 70m videos with multiple cross-modality teachers

TS Chen, A Siarohin, W Menapace… - Proceedings of the …, 2024 - openaccess.thecvf.com
The quality of the data and annotation upper-bounds the quality of a downstream model.
While there exist large text corpora and image-text pairs high-quality video-text data is much …

Videopoet: A large language model for zero-shot video generation

D Kondratyuk, L Yu, X Gu, J Lezama, J Huang… - arXiv preprint arXiv …, 2023 - arxiv.org
We present VideoPoet, a language model capable of synthesizing high-quality video, with
matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder …

Make pixels dance: High-dynamic video generation

Y Zeng, G Wei, J Zheng, J Zou, Y Wei… - Proceedings of the …, 2024 - openaccess.thecvf.com
Creating high-dynamic videos such as motion-rich actions and sophisticated visual effects
poses a significant challenge in the field of artificial intelligence. Unfortunately current state …

State of the art on diffusion models for visual computing

R Po, W Yifan, V Golyanik, K Aberman… - Computer Graphics …, 2024 - Wiley Online Library
The field of visual computing is rapidly advancing due to the emergence of generative
artificial intelligence (AI), which unlocks unprecedented capabilities for the generation …

When does Sora show: The beginning of TAO to imaginative intelligence and scenarios engineering

FY Wang, Q Miao, L Li, Q Ni, X Li, J Li… - IEEE/CAA Journal of …, 2024 - ieeexplore.ieee.org
During our discussion at workshops for writing “What Does ChatGPT Say: The DAO from
Algorithmic Intelligence to Linguistic Intelligence”[1], we had expected the next milestone for …

Dreamgaussian4d: Generative 4d gaussian splatting

J Ren, L Pan, J Tang, C Zhang, A Cao, G Zeng… - arXiv preprint arXiv …, 2023 - arxiv.org
Remarkable progress has been made in 4D content generation recently. However, existing
methods suffer from long optimization time, lack of motion controllability, and a low level of …

Driving into the future: Multiview visual forecasting and planning with world model for autonomous driving

Y Wang, J He, L Fan, H Li, Y Chen… - Proceedings of the …, 2024 - openaccess.thecvf.com
In autonomous driving predicting future events in advance and evaluating the foreseeable
risks empowers autonomous vehicles to plan their actions enhancing safety and efficiency …