Va-depthnet: A variational approach to single image depth prediction

B Ke, A Obukhov, S Huang, N Metzger… - Proceedings of the …, 2024 - openaccess.thecvf.com

Monocular depth estimation is a fundamental computer vision task. Recovering 3D depth
from a single image is geometrically ill-posed and requires scene understanding so it is not …

被引用次数：79 相关文章所有 3 个版本

[PDF] arxiv.org

Binsformer: Revisiting adaptive bins for monocular depth estimation

Z Li, X Wang, X Liu, J Jiang - IEEE Transactions on Image …, 2024 - ieeexplore.ieee.org

Monocular depth estimation (MDE) is a fundamental task in computer vision and has drawn
increasing attention. Recently, some methods reformulate it as a classification-regression …

被引用次数：146 相关文章所有 2 个版本

[PDF] thecvf.com

Towards zero-shot scale-aware monocular depth estimation

V Guizilini, I Vasiljevic, D Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

Monocular depth estimation is scale-ambiguous, and thus requires scale supervision to
produce metric predictions. Even so, the resulting models will be geometry-specific, with …

被引用次数：37 相关文章所有 5 个版本

[PDF] thecvf.com

Omnivec: Learning robust representations with cross modal sharing

S Srivastava, G Sharma - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com

Majority of research in learning based methods has been towards designing and training
networks for specific tasks. However, many of the learning based tasks, across modalities …

被引用次数：46 相关文章所有 5 个版本

[PDF] thecvf.com

Single image depth prediction made better: A multivariate gaussian take

C Liu, S Kumar, S Gu, R Timofte… - Proceedings of the …, 2023 - openaccess.thecvf.com

Neural-network-based single image depth prediction (SIDP) is a challenging task where the
goal is to predict the scene's per-pixel depth at test time. Since the problem, by definition, is …

被引用次数：18 相关文章所有 7 个版本

Crossfuser: Multi-modal feature fusion for end-to-end autonomous driving under unseen weather conditions

W Wu, X Deng, P Jiang, S Wan… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Multi-modal fusion is a promising approach to boost the autonomous driving performance
and has already received a large amount of attention. Meanwhile, to increase driving …

被引用次数：18 相关文章所有 3 个版本

[PDF] thecvf.com

Wordepth: Variational language prior for monocular depth estimation

Z Zeng, D Wang, F Yang, H Park… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Three-dimensional (3D) reconstruction from a single image is an ill-posed problem
with inherent ambiguities ie scale. Predicting a 3D scene from text description (s) is similarly …

被引用次数：11 相关文章所有 3 个版本

UniMod1K: Towards a More Universal Large-Scale Dataset and Benchmark for Multi-modal Learning

XF Zhu, T Xu, Z Liu, Z Tang, XJ Wu, J Kittler - International Journal of …, 2024 - Springer

The emergence of large-scale high-quality datasets has stimulated the rapid development of
deep learning in recent years. However, most computer vision tasks focus on the visual …

被引用次数：7 相关文章

[PDF] thecvf.com

Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning

R Li, T Fischer, M Segu, M Pollefeys… - Proceedings of the …, 2024 - openaccess.thecvf.com

Recovering the 3D scene geometry from a single view is a fundamental yet ill-posed
problem in computer vision. While classical depth estimation methods infer only a 2.5 D …

被引用次数：1 相关文章所有 3 个版本

[PDF] thecvf.com

Atlantis: Enabling Underwater Depth Estimation with Stable Diffusion

F Zhang, S You, Y Li, Y Fu - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com

Monocular depth estimation has experienced significant progress on terrestrial images in
recent years thanks to deep learning advancements. But it remains inadequate for …

被引用次数：3 相关文章所有 3 个版本