Scaling laws for reward model overoptimization
In reinforcement learning from human feedback, it is common to optimize against a reward
model trained to predict human preferences. Because the reward model is an imperfect …
model trained to predict human preferences. Because the reward model is an imperfect …
A survey of zero-shot generalisation in deep reinforcement learning
The study of zero-shot generalisation (ZSG) in deep Reinforcement Learning (RL) aims to
produce RL algorithms whose policies generalise well to novel unseen situations at …
produce RL algorithms whose policies generalise well to novel unseen situations at …
Deep transfer learning approaches for Monkeypox disease diagnosis
Monkeypox has become a significant global challenge as the number of cases increases
daily. Those infected with the disease often display various skin symptoms and can spread …
daily. Those infected with the disease often display various skin symptoms and can spread …
Mopo: Model-based offline policy optimization
Offline reinforcement learning (RL) refers to the problem of learning policies entirely from a
batch of previously collected data. This problem setting is compelling, because it offers the …
batch of previously collected data. This problem setting is compelling, because it offers the …
Leveraging procedural generation to benchmark reinforcement learning
Abstract We introduce Procgen Benchmark, a suite of 16 procedurally generated game-like
environments designed to benchmark both sample efficiency and generalization in …
environments designed to benchmark both sample efficiency and generalization in …
An introduction to deep reinforcement learning
Deep reinforcement learning is the combination of reinforcement learning (RL) and deep
learning. This field of research has been able to solve a wide range of complex …
learning. This field of research has been able to solve a wide range of complex …
Causal reinforcement learning: A survey
Reinforcement learning is an essential paradigm for solving sequential decision problems
under uncertainty. Despite many remarkable achievements in recent decades, applying …
under uncertainty. Despite many remarkable achievements in recent decades, applying …
Evolving curricula with regret-based environment design
Training generally-capable agents with reinforcement learning (RL) remains a significant
challenge. A promising avenue for improving the robustness of RL agents is through the use …
challenge. A promising avenue for improving the robustness of RL agents is through the use …
On the measure of intelligence
F Chollet - arXiv preprint arXiv:1911.01547, 2019 - arxiv.org
To make deliberate progress towards more intelligent and more human-like artificial
systems, we need to be following an appropriate feedback signal: we need to be able to …
systems, we need to be following an appropriate feedback signal: we need to be able to …
Contrastive behavioral similarity embeddings for generalization in reinforcement learning
Reinforcement learning methods trained on few environments rarely learn policies that
generalize to unseen environments. To improve generalization, we incorporate the inherent …
generalize to unseen environments. To improve generalization, we incorporate the inherent …