Shadows Don't Lie and Lines Can't Bend! Generative Models don't know Projective Geometry... for now

A Sarkar, H Mai, A Mahapatra… - Proceedings of the …, 2024 - openaccess.thecvf.com
Generative models can produce impressively realistic images. This paper demonstrates that
generated images have geometric features different from those of real images. We build a …

Amodal ground truth and completion in the wild

G Zhan, C Zheng, W Xie… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
This paper studies amodal image segmentation: predicting entire object segmentation
masks including both visible and invisible (occluded) parts. In previous work the amodal …

Generative models: What do they know? do they know things? let's find out!

X Du, N Kolkin, G Shakhnarovich, A Bhattad - arXiv preprint arXiv …, 2023 - arxiv.org
Generative models excel at mimicking real scenes, suggesting they might inherently encode
important intrinsic scene properties. In this paper, we aim to explore the following key …

Lightit: Illumination modeling and control for diffusion models

P Kocsis, J Philip, K Sunkavalli… - Proceedings of the …, 2024 - openaccess.thecvf.com
We introduce LightIt a method for explicit illumination control for image generation. Recent
generative methods lack lighting control which is crucial to numerous artistic aspects of …

Faster diffusion: Rethinking the role of unet encoder in diffusion models

S Li, T Hu, F Shahbaz Khan, L Li, S Yang… - arXiv e …, 2023 - ui.adsabs.harvard.edu
One of the key components within diffusion models is the UNet for noise prediction. While
several works have explored basic properties of the UNet decoder, its encoder largely …

Object pose estimation via the aggregation of diffusion features

T Wang, G Hu, H Wang - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Estimating the pose of objects from images is a crucial task of 3D scene understanding and
recent approaches have shown promising results on very large benchmarks. However these …

Amodal completion via progressive mixed context diffusion

K Xu, L Zhang, J Shi - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Our brain can effortlessly recognize objects even when partially hidden from view. Seeing
the visible of the hidden is called amodal completion; however this task remains a challenge …

Lexicon3d: Probing visual foundation models for complex 3d scene understanding

Y Man, S Zheng, Z Bao, M Hebert, LY Gui… - arXiv preprint arXiv …, 2024 - arxiv.org
Complex 3D scene understanding has gained increasing attention, with scene encoding
strategies playing a crucial role in this success. However, the optimal scene encoding …

Can Visual Foundation Models Achieve Long-term Point Tracking?

G Aydemir, W Xie, F Güney - arXiv preprint arXiv:2408.13575, 2024 - arxiv.org
Large-scale vision foundation models have demonstrated remarkable success across
various tasks, underscoring their robust generalization capabilities. While their proficiency in …

Viewpoint Textual Inversion: Discovering Scene Representations and 3D View Control in 2D Diffusion Models

J Burgess, KC Wang, S Yeung-Levy - European Conference on Computer …, 2025 - Springer
Text-to-image diffusion models generate impressive and realistic images, but do they learn
to represent the 3D world from only 2D supervision? We demonstrate that yes, certain 3D …