Physgen: Rigid-body physics-grounded image-to-video generation

S Liu, Z Ren, S Gupta, S Wang - European Conference on Computer …, 2025 - Springer
We present PhysGen, a novel image-to-video generation method that converts a single
image and an input condition (eg., force and torque applied to an object in the image) to …

Depthcrafter: Generating consistent long depth sequences for open-world videos

W Hu, X Gao, X Li, S Zhao, X Cun, Y Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Despite significant advancements in monocular depth estimation for static images,
estimating video depth in the open world remains challenging, since open-world videos are …

Lotus: Diffusion-based visual foundation model for high-quality dense prediction

J He, H Li, W Yin, Y Liang, L Li, K Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org
Leveraging the visual priors of pre-trained text-to-image diffusion models offers a promising
solution to enhance zero-shot generalization in dense prediction tasks. However, existing …

Puzzleavatar: Assembling 3d avatars from personal albums

Y Xiu, Y Ye, Z Liu, D Tzionas, MJ Black - ACM Transactions on Graphics …, 2024 - dl.acm.org
Generating personalized 3D avatars is crucial for AR/VR. However, recent text-to-3D
methods that generate avatars for celebrities or fictional characters, struggle with everyday …

The third monocular depth estimation challenge

J Spencer, F Tosi, M Poggi, RS Arora… - Proceedings of the …, 2024 - openaccess.thecvf.com
This paper discusses the results of the third edition of the Monocular Depth Estimation
Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging …

Betterdepth: Plug-and-play diffusion refiner for zero-shot monocular depth estimation

X Zhang, B Ke, H Riemenschneider, N Metzger… - arXiv preprint arXiv …, 2024 - arxiv.org
By training over large-scale datasets, zero-shot monocular depth estimation (MDE) methods
show robust performance in the wild but often suffer from insufficient detail. Although recent …

Moge: Unlocking accurate monocular geometry estimation for open-domain images with optimal training supervision

R Wang, S Xu, C Dai, J Xiang, Y Deng, X Tong… - arXiv preprint arXiv …, 2024 - arxiv.org
We present MoGe, a powerful model for recovering 3D geometry from monocular open-
domain images. Given a single image, our model directly predicts a 3D point map of the …

Nd-sdf: Learning normal deflection fields for high-fidelity indoor reconstruction

Z Tang, W Ye, Y Wang, D Huang, H Bao, T He… - arXiv preprint arXiv …, 2024 - arxiv.org
Neural implicit reconstruction via volume rendering has demonstrated its effectiveness in
recovering dense 3D surfaces. However, it is non-trivial to simultaneously recover …

Unleashing the potential of the diffusion model in few-shot semantic segmentation

M Zhu, Y Liu, Z Luo, C Jing, H Chen, G Xu… - arXiv preprint arXiv …, 2024 - arxiv.org
The Diffusion Model has not only garnered noteworthy achievements in the realm of image
generation but has also demonstrated its potential as an effective pretraining method …

Stereocrafter: Diffusion-based generation of long and high-fidelity stereoscopic 3d from monocular videos

S Zhao, W Hu, X Cun, Y Zhang, X Li, Z Kong… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper presents a novel framework for converting 2D videos to immersive stereoscopic
3D, addressing the growing demand for 3D content in immersive experience. Leveraging …