Geowizard: Unleashing the diffusion priors for 3d geometry estimation from a single image

S Liu, Z Ren, S Gupta, S Wang - European Conference on Computer …, 2025 - Springer

We present PhysGen, a novel image-to-video generation method that converts a single
image and an input condition (eg., force and torque applied to an object in the image) to …

被引用次数：14 相关文章所有 8 个版本

[PDF] arxiv.org

Depthcrafter: Generating consistent long depth sequences for open-world videos

W Hu, X Gao, X Li, S Zhao, X Cun, Y Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

Despite significant advancements in monocular depth estimation for static images,
estimating video depth in the open world remains challenging, since open-world videos are …

被引用次数：21 相关文章所有 2 个版本

[PDF] arxiv.org

Lotus: Diffusion-based visual foundation model for high-quality dense prediction

J He, H Li, W Yin, Y Liang, L Li, K Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org

Leveraging the visual priors of pre-trained text-to-image diffusion models offers a promising
solution to enhance zero-shot generalization in dense prediction tasks. However, existing …

被引用次数：15 相关文章

[PDF] acm.org

Puzzleavatar: Assembling 3d avatars from personal albums

Y Xiu, Y Ye, Z Liu, D Tzionas, MJ Black - ACM Transactions on Graphics …, 2024 - dl.acm.org

Generating personalized 3D avatars is crucial for AR/VR. However, recent text-to-3D
methods that generate avatars for celebrities or fictional characters, struggle with everyday …

被引用次数：3 相关文章所有 2 个版本

[PDF] thecvf.com

The third monocular depth estimation challenge

J Spencer, F Tosi, M Poggi, RS Arora… - Proceedings of the …, 2024 - openaccess.thecvf.com

This paper discusses the results of the third edition of the Monocular Depth Estimation
Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging …

被引用次数：4 相关文章所有 7 个版本

[PDF] arxiv.org

Betterdepth: Plug-and-play diffusion refiner for zero-shot monocular depth estimation

X Zhang, B Ke, H Riemenschneider, N Metzger… - arXiv preprint arXiv …, 2024 - arxiv.org

By training over large-scale datasets, zero-shot monocular depth estimation (MDE) methods
show robust performance in the wild but often suffer from insufficient detail. Although recent …

被引用次数：6 相关文章所有 3 个版本

[PDF] arxiv.org

Moge: Unlocking accurate monocular geometry estimation for open-domain images with optimal training supervision

R Wang, S Xu, C Dai, J Xiang, Y Deng, X Tong… - arXiv preprint arXiv …, 2024 - arxiv.org

We present MoGe, a powerful model for recovering 3D geometry from monocular open-
domain images. Given a single image, our model directly predicts a 3D point map of the …

被引用次数：3 相关文章

[PDF] arxiv.org

Nd-sdf: Learning normal deflection fields for high-fidelity indoor reconstruction

Z Tang, W Ye, Y Wang, D Huang, H Bao, T He… - arXiv preprint arXiv …, 2024 - arxiv.org

Neural implicit reconstruction via volume rendering has demonstrated its effectiveness in
recovering dense 3D surfaces. However, it is non-trivial to simultaneously recover …

被引用次数：3 相关文章

[PDF] arxiv.org

Unleashing the potential of the diffusion model in few-shot semantic segmentation

M Zhu, Y Liu, Z Luo, C Jing, H Chen, G Xu… - arXiv preprint arXiv …, 2024 - arxiv.org

The Diffusion Model has not only garnered noteworthy achievements in the realm of image
generation but has also demonstrated its potential as an effective pretraining method …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Stereocrafter: Diffusion-based generation of long and high-fidelity stereoscopic 3d from monocular videos

S Zhao, W Hu, X Cun, Y Zhang, X Li, Z Kong… - arXiv preprint arXiv …, 2024 - arxiv.org

This paper presents a novel framework for converting 2D videos to immersive stereoscopic
3D, addressing the growing demand for 3D content in immersive experience. Leveraging …

被引用次数：2 相关文章所有 3 个版本