Vd3d: Taming large video diffusion transformers for 3d camera control
Modern text-to-video synthesis models demonstrate coherent, photorealistic generation of
complex videos from a text description. However, most existing models lack fine-grained …
complex videos from a text description. However, most existing models lack fine-grained …
Splatt3r: Zero-shot gaussian splatting from uncalibrated image pairs
In this paper, we introduce Splatt3R, a pose-free, feed-forward method for in-the-wild 3D
reconstruction and novel view synthesis from stereo pairs. Given uncalibrated natural …
reconstruction and novel view synthesis from stereo pairs. Given uncalibrated natural …
Reconx: Reconstruct any scene from sparse views with video diffusion model
Advancements in 3D scene reconstruction have transformed 2D images from the real world
into 3D models, producing realistic 3D results from hundreds of input photos. Despite great …
into 3D models, producing realistic 3D results from hundreds of input photos. Despite great …
Mvsplat360: Feed-forward 360 scene synthesis from sparse views
We introduce MVSplat360, a feed-forward approach for 360 {\deg} novel view synthesis
(NVS) of diverse real-world scenes, using only sparse observations. This setting is …
(NVS) of diverse real-world scenes, using only sparse observations. This setting is …
MultiDepth: Multi-Sample Priors for Refining Monocular Metric Depth Estimations in Indoor Scenes
S Byun, J Song, WS Chung - arXiv preprint arXiv:2411.01048, 2024 - arxiv.org
Monocular metric depth estimation (MMDE) is a crucial task to solve for indoor scene
reconstruction on edge devices. Despite this importance, existing models are sensitive to …
reconstruction on edge devices. Despite this importance, existing models are sensitive to …
GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding
3D Semantic Occupancy Prediction is fundamental for spatial understanding as it provides a
comprehensive semantic cognition of surrounding environments. However, prevalent …
comprehensive semantic cognition of surrounding environments. However, prevalent …
Novel View Synthesis with Pixel-Space Diffusion Models
Synthesizing a novel view from a single input image is a challenging task. Traditionally, this
task was approached by estimating scene depth, warping, and inpainting, with machine …
task was approached by estimating scene depth, warping, and inpainting, with machine …
AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers
Numerous works have recently integrated 3D camera control into foundational text-to-video
models, but the resulting camera control is often imprecise, and video generation quality …
models, but the resulting camera control is often imprecise, and video generation quality …
Generative Densification: Learning to Densify Gaussians for High-Fidelity Generalizable 3D Reconstruction
Generalized feed-forward Gaussian models have achieved significant progress in sparse-
view 3D reconstruction by leveraging prior knowledge from large multi-view datasets …
view 3D reconstruction by leveraging prior knowledge from large multi-view datasets …
A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision
We introduce a diffusion model for Gaussian Splats, SplatDiffusion, to enable generation of
three-dimensional structures from single images, addressing the ill-posed nature of lifting 2D …
three-dimensional structures from single images, addressing the ill-posed nature of lifting 2D …