Repurposing diffusion-based image generators for monocular depth estimation
Monocular depth estimation is a fundamental computer vision task. Recovering 3D depth
from a single image is geometrically ill-posed and requires scene understanding so it is not …
from a single image is geometrically ill-posed and requires scene understanding so it is not …
Binsformer: Revisiting adaptive bins for monocular depth estimation
Monocular depth estimation (MDE) is a fundamental task in computer vision and has drawn
increasing attention. Recently, some methods reformulate it as a classification-regression …
increasing attention. Recently, some methods reformulate it as a classification-regression …
Towards zero-shot scale-aware monocular depth estimation
Monocular depth estimation is scale-ambiguous, and thus requires scale supervision to
produce metric predictions. Even so, the resulting models will be geometry-specific, with …
produce metric predictions. Even so, the resulting models will be geometry-specific, with …
Omnivec: Learning robust representations with cross modal sharing
S Srivastava, G Sharma - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
Majority of research in learning based methods has been towards designing and training
networks for specific tasks. However, many of the learning based tasks, across modalities …
networks for specific tasks. However, many of the learning based tasks, across modalities …
Single image depth prediction made better: A multivariate gaussian take
Neural-network-based single image depth prediction (SIDP) is a challenging task where the
goal is to predict the scene's per-pixel depth at test time. Since the problem, by definition, is …
goal is to predict the scene's per-pixel depth at test time. Since the problem, by definition, is …
Crossfuser: Multi-modal feature fusion for end-to-end autonomous driving under unseen weather conditions
Multi-modal fusion is a promising approach to boost the autonomous driving performance
and has already received a large amount of attention. Meanwhile, to increase driving …
and has already received a large amount of attention. Meanwhile, to increase driving …
Wordepth: Variational language prior for monocular depth estimation
Abstract Three-dimensional (3D) reconstruction from a single image is an ill-posed problem
with inherent ambiguities ie scale. Predicting a 3D scene from text description (s) is similarly …
with inherent ambiguities ie scale. Predicting a 3D scene from text description (s) is similarly …
UniMod1K: Towards a More Universal Large-Scale Dataset and Benchmark for Multi-modal Learning
The emergence of large-scale high-quality datasets has stimulated the rapid development of
deep learning in recent years. However, most computer vision tasks focus on the visual …
deep learning in recent years. However, most computer vision tasks focus on the visual …
Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning
Recovering the 3D scene geometry from a single view is a fundamental yet ill-posed
problem in computer vision. While classical depth estimation methods infer only a 2.5 D …
problem in computer vision. While classical depth estimation methods infer only a 2.5 D …
Atlantis: Enabling Underwater Depth Estimation with Stable Diffusion
Monocular depth estimation has experienced significant progress on terrestrial images in
recent years thanks to deep learning advancements. But it remains inadequate for …
recent years thanks to deep learning advancements. But it remains inadequate for …