Factoring shape, pose, and layout from the 2d image of a 3d scene

J Parente, E Rodrigues, B Rangel, JP Martins - Journal of Building …, 2023 - Elsevier

Convolutional and adversarial networks are found in various fields of knowledge and
activities. One such field is building design, a multi-disciplinary and multi-task process …

被引用次数：13 相关文章所有 4 个版本

[PDF] thecvf.com

Grf: Learning a general radiance field for 3d representation and rendering

A Trevithick, B Yang - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com

We present a simple yet powerful neural network that implicitly represents and renders 3D
objects and scenes only from 2D observations. The network models 3D geometries as a …

被引用次数：278 相关文章所有 6 个版本

[PDF] thecvf.com

Synsin: End-to-end view synthesis from a single image

O Wiles, G Gkioxari, R Szeliski… - Proceedings of the …, 2020 - openaccess.thecvf.com

View synthesis allows for the generation of new views of a scene given one or more images.
This is challenging; it requires comprehensively understanding the 3D scene from images …

被引用次数：463 相关文章所有 10 个版本

[PDF] thecvf.com

Omni3d: A large benchmark and model for 3d object detection in the wild

G Brazil, A Kumar, J Straub, N Ravi… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recognizing scenes and objects in 3D from a single image is a longstanding goal of
computer vision with applications in robotics and AR/VR. For 2D recognition, large datasets …

被引用次数：85 相关文章所有 6 个版本

[PDF] arxiv.org

Image-based 3D object reconstruction: State-of-the-art and trends in the deep learning era

XF Han, H Laga, M Bennamoun - IEEE transactions on pattern …, 2019 - ieeexplore.ieee.org

3D reconstruction is a longstanding ill-posed problem, which has been explored for decades
by the computer vision, computer graphics, and machine learning communities. Since 2015 …

被引用次数：480 相关文章所有 9 个版本

[HTML] sciencedirect.com

[HTML][HTML] DILF: Differentiable rendering-based multi-view Image–Language Fusion for zero-shot 3D shape understanding

X Ning, Z Yu, L Li, W Li, P Tiwari - Information Fusion, 2024 - Elsevier

Zero-shot 3D shape understanding aims to recognize “unseen” 3D categories that are not
present in training data. Recently, Contrastive Language–Image Pre-training (CLIP) has …

被引用次数：51 相关文章所有 5 个版本

[PDF] thecvf.com

Total3dunderstanding: Joint layout, object pose and mesh reconstruction for indoor scenes from a single image

Y Nie, X Han, S Guo, Y Zheng… - Proceedings of the …, 2020 - openaccess.thecvf.com

Semantic reconstruction of indoor scenes refers to both scene understanding and object
reconstruction. Existing works either address one part of this problem or focus on …

被引用次数：248 相关文章所有 9 个版本

[PDF] arxiv.org

Shape and viewpoint without keypoints

S Goel, A Kanazawa, J Malik - … Conference, Glasgow, UK, August 23–28 …, 2020 - Springer

We present a learning framework that learns to recover the 3D shape, pose and texture from
a single image, trained on an image collection without any ground truth 3D shape, multi …

被引用次数：165 相关文章所有 5 个版本

[PDF] thecvf.com

Reconstructing hand-object interactions in the wild

Z Cao, I Radosavovic… - Proceedings of the …, 2021 - openaccess.thecvf.com

We study the problem of understanding hand-object interactions from 2D images in the wild.
This requires reconstructing both the hand and the object in 3D, which is challenging …

被引用次数：151 相关文章所有 5 个版本

[PDF] arxiv.org

Perceiving 3d human-object spatial arrangements from a single image in the wild

JY Zhang, S Pepose, H Joo, D Ramanan… - Computer Vision–ECCV …, 2020 - Springer

We present a method that infers spatial arrangements and shapes of humans and objects in
a globally consistent 3D scene, all from a single image in-the-wild captured in an …

被引用次数：145 相关文章所有 5 个版本