Physgen: Rigid-body physics-grounded image-to-video generation
We present PhysGen, a novel image-to-video generation method that converts a single
image and an input condition (eg., force and torque applied to an object in the image) to …
image and an input condition (eg., force and torque applied to an object in the image) to …
Depthcrafter: Generating consistent long depth sequences for open-world videos
Despite significant advancements in monocular depth estimation for static images,
estimating video depth in the open world remains challenging, since open-world videos are …
estimating video depth in the open world remains challenging, since open-world videos are …
Lotus: Diffusion-based visual foundation model for high-quality dense prediction
Leveraging the visual priors of pre-trained text-to-image diffusion models offers a promising
solution to enhance zero-shot generalization in dense prediction tasks. However, existing …
solution to enhance zero-shot generalization in dense prediction tasks. However, existing …
Puzzleavatar: Assembling 3d avatars from personal albums
Generating personalized 3D avatars is crucial for AR/VR. However, recent text-to-3D
methods that generate avatars for celebrities or fictional characters, struggle with everyday …
methods that generate avatars for celebrities or fictional characters, struggle with everyday …
The third monocular depth estimation challenge
This paper discusses the results of the third edition of the Monocular Depth Estimation
Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging …
Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging …
Betterdepth: Plug-and-play diffusion refiner for zero-shot monocular depth estimation
By training over large-scale datasets, zero-shot monocular depth estimation (MDE) methods
show robust performance in the wild but often suffer from insufficient detail. Although recent …
show robust performance in the wild but often suffer from insufficient detail. Although recent …
Moge: Unlocking accurate monocular geometry estimation for open-domain images with optimal training supervision
We present MoGe, a powerful model for recovering 3D geometry from monocular open-
domain images. Given a single image, our model directly predicts a 3D point map of the …
domain images. Given a single image, our model directly predicts a 3D point map of the …
Nd-sdf: Learning normal deflection fields for high-fidelity indoor reconstruction
Neural implicit reconstruction via volume rendering has demonstrated its effectiveness in
recovering dense 3D surfaces. However, it is non-trivial to simultaneously recover …
recovering dense 3D surfaces. However, it is non-trivial to simultaneously recover …
Unleashing the potential of the diffusion model in few-shot semantic segmentation
The Diffusion Model has not only garnered noteworthy achievements in the realm of image
generation but has also demonstrated its potential as an effective pretraining method …
generation but has also demonstrated its potential as an effective pretraining method …
Stereocrafter: Diffusion-based generation of long and high-fidelity stereoscopic 3d from monocular videos
This paper presents a novel framework for converting 2D videos to immersive stereoscopic
3D, addressing the growing demand for 3D content in immersive experience. Leveraging …
3D, addressing the growing demand for 3D content in immersive experience. Leveraging …