Is sora a world simulator? a comprehensive survey on general world models and beyond

Z Zhu, X Wang, W Zhao, C Min, N Deng, M Dou… - arXiv preprint arXiv …, 2024 - arxiv.org
General world models represent a crucial pathway toward achieving Artificial General
Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual …

Towards knowledge-driven autonomous driving

X Li, Y Bai, P Cai, L Wen, D Fu, B Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
This paper explores the emerging knowledge-driven autonomous driving technologies. Our
investigation highlights the limitations of current autonomous driving systems, in particular …

Forging vision foundation models for autonomous driving: Challenges, methodologies, and opportunities

X Yan, H Zhang, Y Cai, J Guo, W Qiu, B Gao… - arXiv preprint arXiv …, 2024 - arxiv.org
The rise of large foundation models, trained on extensive datasets, is revolutionizing the
field of AI. Models such as SAM, DALL-E2, and GPT-4 showcase their adaptability by …

World models for autonomous driving: An initial survey

Y Guan, H Liao, Z Li, J Hu, R Yuan, Y Li… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
In the rapidly evolving landscape of autonomous driving, the capability to accurately predict
future events and assess their implications is paramount for both safety and efficiency …

Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives

S Luo, W Chen, W Tian, R Liu, L Hou… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
Foundation models have indeed made a profound impact on various fields, emerging as
pivotal components that significantly shape the capabilities of intelligent systems. In the …

Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability

S Gao, J Yang, L Chen, K Chitta, Y Qiu… - arXiv preprint arXiv …, 2024 - arxiv.org
World models can foresee the outcomes of different actions, which is of paramount
importance for autonomous driving. Nevertheless, existing driving world models still have …

Occfiner: Offboard occupancy refinement with hybrid propagation

H Shi, S Wang, J Zhang, X Yin, Z Wang, Z Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org
Vision-based occupancy prediction, also known as 3D Semantic Scene Completion (SSC),
presents a significant challenge in computer vision. Previous methods, confined to onboard …

Pandora: Towards General World Model with Natural Language Actions and Video States

J Xiang, G Liu, Y Gu, Q Gao, Y Ning, Y Zha… - arXiv preprint arXiv …, 2024 - arxiv.org
World models simulate future states of the world in response to different actions. They
facilitate interactive content creation and provides a foundation for grounded, long-horizon …

OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving

L Wang, W Zheng, Y Ren, H Jiang, Z Cui, H Yu… - arXiv preprint arXiv …, 2024 - arxiv.org
Understanding the evolution of 3D scenes is important for effective autonomous driving.
While conventional methods mode scene development with the motion of individual …

GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

Y Huang, W Zheng, Y Zhang, J Zhou, J Lu - arXiv preprint arXiv …, 2024 - arxiv.org
3D semantic occupancy prediction aims to obtain 3D fine-grained geometry and semantics
of the surrounding scene and is an important task for the robustness of vision-centric …