Planning to be surprised: Optimal bayesian exploration in dynamic environments

K Friston, L Da Costa, N Sajid, C Heins, K Ueltzhöffer… - Physics Reports, 2023 - Elsevier

This paper provides a concise description of the free energy principle, starting from a
formulation of random dynamical systems in terms of a Langevin equation and ending with a …

被引用次数：104 相关文章所有 7 个版本

[PDF] sciencedirect.com

Continual learning for robotics: Definition, framework, learning strategies, opportunities and challenges

T Lesort, V Lomonaco, A Stoian, D Maltoni, D Filliat… - Information fusion, 2020 - Elsevier

Continual learning (CL) is a particular machine learning paradigm where the data
distribution and learning objective change through time, or where all the training data and …

被引用次数：461 相关文章所有 12 个版本

[PDF] mlr.press

Is pessimism provably efficient for offline rl?

Y Jin, Z Yang, Z Wang - International Conference on …, 2021 - proceedings.mlr.press

We study offline reinforcement learning (RL), which aims to learn an optimal policy based on
a dataset collected a priori. Due to the lack of further interactions with the environment …

被引用次数：386 相关文章所有 7 个版本

[PDF] mlr.press

Planning to explore via self-supervised world models

R Sekar, O Rybkin, K Daniilidis… - International …, 2020 - proceedings.mlr.press

Reinforcement learning allows solving complex tasks, however, the learning tends to be task-
specific and the sample efficiency remains a challenge. We present Plan2Explore, a self …

被引用次数：389 相关文章所有 8 个版本

[PDF] nowpublishers.com

An introduction to deep reinforcement learning

V François-Lavet, P Henderson, R Islam… - … and Trends® in …, 2018 - nowpublishers.com

Deep reinforcement learning is the combination of reinforcement learning (RL) and deep
learning. This field of research has been able to solve a wide range of complex …

被引用次数：1709 相关文章所有 16 个版本

[PDF] mlr.press

Graph networks as learnable physics engines for inference and control

A Sanchez-Gonzalez, N Heess… - International …, 2018 - proceedings.mlr.press

Understanding and interacting with everyday physical scenes requires rich knowledge
about the structure of the world, represented either implicitly in a value or policy function, or …

被引用次数：691 相关文章所有 5 个版本

[PDF] mlr.press

Curiosity-driven exploration by self-supervised prediction

D Pathak, P Agrawal, AA Efros… - … conference on machine …, 2017 - proceedings.mlr.press

In many real-world scenarios, rewards extrinsic to the agent are extremely sparse, or absent
altogether. In such cases, curiosity can serve as an intrinsic reward signal to enable the …

被引用次数：2727 相关文章所有 16 个版本

[PDF] mlr.press

Self-supervised exploration via disagreement

D Pathak, D Gandhi, A Gupta - International conference on …, 2019 - proceedings.mlr.press

Efficient exploration is a long-standing problem in sensorimotor learning. Major advances
have been demonstrated in noise-free, non-stochastic domains such as video games and …

被引用次数：399 相关文章所有 6 个版本

[PDF] neurips.cc

Discovering and achieving goals via world models

R Mendonca, O Rybkin, K Daniilidis… - Advances in …, 2021 - proceedings.neurips.cc

How can artificial agents learn to solve many diverse tasks in complex visual environments
without any supervision? We decompose this question into two challenges: discovering new …

被引用次数：110 相关文章所有 13 个版本

[PDF] neurips.cc

# exploration: A study of count-based exploration for deep reinforcement learning

H Tang, R Houthooft, D Foote… - Advances in neural …, 2017 - proceedings.neurips.cc

Count-based exploration algorithms are known to perform near-optimally when used in
conjunction with tabular reinforcement learning (RL) methods for solving small discrete …

被引用次数：679 相关文章所有 12 个版本