Sceneverse: Scaling 3d vision-language learning for grounded scene understanding
Abstract 3D vision-language (3D-VL) grounding, which aims to align language with 3D
physical environments, stands as a cornerstone in developing embodied agents. In …
physical environments, stands as a cornerstone in developing embodied agents. In …
Anyhome: Open-vocabulary generation of structured and textured 3d homes
Inspired by cognitive theories, we introduce AnyHome, a framework that translates any text
into well-structured and textured indoor scenes at a house-scale. By prompting Large …
into well-structured and textured indoor scenes at a house-scale. By prompting Large …
Physcene: Physically interactable 3d scene synthesis for embodied ai
With recent developments in Embodied Artificial Intelligence (EAI) research there has been
a growing demand for high-quality large-scale interactive scene generation. While prior …
a growing demand for high-quality large-scale interactive scene generation. While prior …
PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators
We present PoliFormer (Policy Transformer), an RGB-only indoor navigation agent trained
end-to-end with reinforcement learning at scale that generalizes to the real-world without …
end-to-end with reinforcement learning at scale that generalizes to the real-world without …
SceneMotifCoder: Example-driven visual program learning for generating 3D object arrangements
Despite advances in text-to-3D generation methods, generation of multi-object
arrangements remains challenging. Current methods exhibit failures in generating …
arrangements remains challenging. Current methods exhibit failures in generating …
Seeing the Unseen: Visual Common Sense for Semantic Placement
Computer vision tasks typically involve describing what is visible in an image (eg
classification detection segmentation and captioning). We study a visual common sense task …
classification detection segmentation and captioning). We study a visual common sense task …
Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation
Abstract We introduce Infinigen Indoors a Blender-based procedural generator of
photorealistic indoor scenes. It builds upon the existing Infinigen system which focuses on …
photorealistic indoor scenes. It builds upon the existing Infinigen system which focuses on …
ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments
Simulated virtual environments have been widely used to learn robotic agents that perform
daily household tasks. These environments encourage research progress by far, but often …
daily household tasks. These environments encourage research progress by far, but often …
Pre-trained text-to-image diffusion models are versatile representation learners for control
Embodied AI agents require a fine-grained understanding of the physical world mediated
through visual and language inputs. Such capabilities are difficult to learn solely from task …
through visual and language inputs. Such capabilities are difficult to learn solely from task …
S2O: Static to openable enhancement for articulated 3D objects
Despite much progress in large 3D datasets there are currently few interactive 3D object
datasets, and their scale is limited due to the manual effort required in their construction. We …
datasets, and their scale is limited due to the manual effort required in their construction. We …