Simone: View-invariant, temporally-abstracted object representations via unsupervised video...

K Greff, F Belletti, L Beyer, C Doersch… - Proceedings of the …, 2022 - openaccess.thecvf.com

Data is the driving force of machine learning, with the amount and quality of training data
often being more important for the performance of a system than architecture and training …

被引用次数：149 相关文章所有 5 个版本

[PDF] neurips.cc

Savi++: Towards end-to-end object-centric learning from real-world videos

G Elsayed, A Mahendran… - Advances in …, 2022 - proceedings.neurips.cc

The visual world can be parsimoniously characterized in terms of distinct entities with sparse
interactions. Discovering this compositional structure in dynamic visual scenes has proven …

被引用次数：104 相关文章所有 7 个版本

[PDF] arxiv.org

Conditional object-centric learning from video

T Kipf, GF Elsayed, A Mahendran, A Stone… - arXiv preprint arXiv …, 2021 - arxiv.org

Object-centric representations are a promising path toward more systematic generalization
by providing flexible abstractions upon which compositional world models can be built …

被引用次数：174 相关文章所有 3 个版本

[PDF] neurips.cc

Object scene representation transformer

MSM Sajjadi, D Duckworth… - Advances in neural …, 2022 - proceedings.neurips.cc

A compositional understanding of the world in terms of objects and their geometry in 3D
space is considered a cornerstone of human cognition. Facilitating the learning of such a …

被引用次数：84 相关文章所有 8 个版本

[PDF] neurips.cc

Simple unsupervised object-centric learning for complex and naturalistic videos

G Singh, YF Wu, S Ahn - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Unsupervised object-centric learning aims to represent the modular, compositional, and
causal structure of a scene as a set of object representations and thereby promises to …

被引用次数：83 相关文章所有 7 个版本

[PDF] arxiv.org

Illiterate dall-e learns to compose

G Singh, F Deng, S Ahn - arXiv preprint arXiv:2110.11405, 2021 - arxiv.org

Although DALL-E has shown an impressive ability of composition-based systematic
generalization in image generation, it requires the dataset of text-image pairs and the …

被引用次数：108 相关文章所有 5 个版本

[PDF] arxiv.org

Object discovery and representation networks

OJ Hénaff, S Koppula, E Shelhamer, D Zoran… - European conference on …, 2022 - Springer

The promise of self-supervised learning (SSL) is to leverage large amounts of unlabeled
data to solve complex tasks. While there has been excellent progress with simple, image …

被引用次数：75 相关文章所有 5 个版本

[PDF] thecvf.com

Towards unsupervised object detection from lidar point clouds

L Zhang, AJ Yang, Y Xiong, S Casas… - Proceedings of the …, 2023 - openaccess.thecvf.com

In this paper, we study the problem of unsupervised object detection from 3D point clouds in
self-driving scenes. We present a simple yet effective method that exploits (i) point clustering …

被引用次数：20 相关文章所有 5 个版本

[PDF] arxiv.org

Slotformer: Unsupervised visual dynamics simulation with object-centric models

Z Wu, N Dvornik, K Greff, T Kipf, A Garg - arXiv preprint arXiv:2210.05861, 2022 - arxiv.org

Understanding dynamics from visual observations is a challenging problem that requires
disentangling individual objects from the scene and learning their interactions. While recent …

被引用次数：59 相关文章所有 4 个版本

[PDF] arxiv.org

Decomposing 3d scenes into objects via unsupervised volume segmentation

K Stelzner, K Kersting, AR Kosiorek - arXiv preprint arXiv:2104.01148, 2021 - arxiv.org

We present ObSuRF, a method which turns a single image of a scene into a 3D model
represented as a set of Neural Radiance Fields (NeRFs), with each NeRF corresponding to …

被引用次数：92 相关文章所有 3 个版本