Repurposing diffusion-based image generators for monocular depth estimation

B Ke, A Obukhov, S Huang, N Metzger… - Proceedings of the …, 2024 - openaccess.thecvf.com
Monocular depth estimation is a fundamental computer vision task. Recovering 3D depth
from a single image is geometrically ill-posed and requires scene understanding so it is not …

Binsformer: Revisiting adaptive bins for monocular depth estimation

Z Li, X Wang, X Liu, J Jiang - IEEE Transactions on Image …, 2024 - ieeexplore.ieee.org
Monocular depth estimation (MDE) is a fundamental task in computer vision and has drawn
increasing attention. Recently, some methods reformulate it as a classification-regression …

Towards zero-shot scale-aware monocular depth estimation

V Guizilini, I Vasiljevic, D Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Monocular depth estimation is scale-ambiguous, and thus requires scale supervision to
produce metric predictions. Even so, the resulting models will be geometry-specific, with …

Omnivec: Learning robust representations with cross modal sharing

S Srivastava, G Sharma - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
Majority of research in learning based methods has been towards designing and training
networks for specific tasks. However, many of the learning based tasks, across modalities …

Single image depth prediction made better: A multivariate gaussian take

C Liu, S Kumar, S Gu, R Timofte… - Proceedings of the …, 2023 - openaccess.thecvf.com
Neural-network-based single image depth prediction (SIDP) is a challenging task where the
goal is to predict the scene's per-pixel depth at test time. Since the problem, by definition, is …

Crossfuser: Multi-modal feature fusion for end-to-end autonomous driving under unseen weather conditions

W Wu, X Deng, P Jiang, S Wan… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Multi-modal fusion is a promising approach to boost the autonomous driving performance
and has already received a large amount of attention. Meanwhile, to increase driving …

Wordepth: Variational language prior for monocular depth estimation

Z Zeng, D Wang, F Yang, H Park… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Three-dimensional (3D) reconstruction from a single image is an ill-posed problem
with inherent ambiguities ie scale. Predicting a 3D scene from text description (s) is similarly …

UniMod1K: Towards a More Universal Large-Scale Dataset and Benchmark for Multi-modal Learning

XF Zhu, T Xu, Z Liu, Z Tang, XJ Wu, J Kittler - International Journal of …, 2024 - Springer
The emergence of large-scale high-quality datasets has stimulated the rapid development of
deep learning in recent years. However, most computer vision tasks focus on the visual …

Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning

R Li, T Fischer, M Segu, M Pollefeys… - Proceedings of the …, 2024 - openaccess.thecvf.com
Recovering the 3D scene geometry from a single view is a fundamental yet ill-posed
problem in computer vision. While classical depth estimation methods infer only a 2.5 D …

Atlantis: Enabling Underwater Depth Estimation with Stable Diffusion

F Zhang, S You, Y Li, Y Fu - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
Monocular depth estimation has experienced significant progress on terrestrial images in
recent years thanks to deep learning advancements. But it remains inadequate for …