SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model

AC Cheng, H Yin, Y Fu, Q Guo, R Yang, J Kautz… - arXiv preprint arXiv …, 2024 - arxiv.org
Vision Language Models (VLMs) have demonstrated remarkable performance in 2D vision
and language tasks. However, their ability to reason about spatial arrangements remains …

DiffCalib: Reformulating Monocular Camera Calibration as Diffusion-Based Dense Incident Map Generation

X He, G Xu, B Zhang, H Chen, Y Cui, D Guo - arXiv preprint arXiv …, 2024 - arxiv.org
Monocular camera calibration is a key precondition for numerous 3D vision applications.
Despite considerable advancements, existing methods often hinge on specific assumptions …