A survey of embodied ai: From simulators to research tasks

J Duan, S Yu, HL Tan, H Zhu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
There has been an emerging paradigm shift from the era of “internet AI” to “embodied AI,”
where AI algorithms and agents no longer learn from datasets of images, videos or text …

Benchmarks for automated commonsense reasoning: A survey

E Davis - ACM Computing Surveys, 2023 - dl.acm.org
More than one hundred benchmarks have been developed to test the commonsense
knowledge and commonsense reasoning abilities of artificial intelligence (AI) systems …

Clevrer: Collision events for video representation and reasoning

K Yi, C Gan, Y Li, P Kohli, J Wu, A Torralba… - arXiv preprint arXiv …, 2019 - arxiv.org
The ability to reason about temporal and causal events from videos lies at the core of human
intelligence. Most video reasoning benchmarks, however, focus on pattern recognition from …

When physics meets machine learning: A survey of physics-informed machine learning

C Meng, S Seo, D Cao, S Griesemer, Y Liu - arXiv preprint arXiv …, 2022 - arxiv.org
Physics-informed machine learning (PIML), referring to the combination of prior knowledge
of physics, which is the high level abstraction of natural phenomenons and human …

Causalworld: A robotic manipulation benchmark for causal structure and transfer learning

O Ahmed, F Träuble, A Goyal, A Neitz, Y Bengio… - arXiv preprint arXiv …, 2020 - arxiv.org
Despite recent successes of reinforcement learning (RL), it remains a challenge for agents
to transfer learned skills to related environments. To facilitate research addressing this …

CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning

R Girdhar, D Ramanan - arXiv preprint arXiv:1910.04744, 2019 - arxiv.org
Computer vision has undergone a dramatic revolution in performance, driven in large part
through deep features trained on large-scale supervised datasets. However, much of these …

Dynamic visual reasoning by learning differentiable physics models from video and language

M Ding, Z Chen, T Du, P Luo… - Advances In Neural …, 2021 - proceedings.neurips.cc
In this work, we propose a unified framework, called Visual Reasoning with Differ-entiable
Physics (VRDP), that can jointly learn visual concepts and infer physics models of objects …

Capturing the objects of vision with neural networks

B Peters, N Kriegeskorte - Nature human behaviour, 2021 - nature.com
Human visual perception carves a scene at its physical joints, decomposing the world into
objects, which are selectively attended, tracked and predicted as we engage our …

Grounding physical concepts of objects and events through dynamic visual reasoning

Z Chen, J Mao, J Wu, KYK Wong… - arXiv preprint arXiv …, 2021 - arxiv.org
We study the problem of dynamic visual reasoning on raw videos. This is a challenging
problem; currently, state-of-the-art models often require dense supervision on physical …

Learning what makes a difference from counterfactual examples and gradient supervision

D Teney, E Abbasnedjad, A van den Hengel - Computer Vision–ECCV …, 2020 - Springer
One of the primary challenges limiting the applicability of deep learning is its susceptibility to
learning spurious correlations rather than the underlying mechanisms of the task of interest …