A review of uncertainty for deep reinforcement learning
O Lockwood, M Si - Proceedings of the AAAI Conference on Artificial …, 2022 - ojs.aaai.org
Uncertainty is ubiquitous in games, both in the agents playing games and often in the games
themselves. Working with uncertainty is therefore an important component of successful …
themselves. Working with uncertainty is therefore an important component of successful …
Combo: Conservative offline model-based policy optimization
Abstract Model-based reinforcement learning (RL) algorithms, which learn a dynamics
model from logged experience and perform conservative planning under the learned model …
model from logged experience and perform conservative planning under the learned model …
Rvs: What is essential for offline rl via supervised learning?
Recent work has shown that supervised learning alone, without temporal difference (TD)
learning, can be remarkably effective for offline RL. When does this hold true, and which …
learning, can be remarkably effective for offline RL. When does this hold true, and which …
Mildly conservative q-learning for offline reinforcement learning
Offline reinforcement learning (RL) defines the task of learning from a static logged dataset
without continually interacting with the environment. The distribution shift between the …
without continually interacting with the environment. The distribution shift between the …
Provable benefits of actor-critic methods for offline reinforcement learning
A Zanette, MJ Wainwright… - Advances in neural …, 2021 - proceedings.neurips.cc
Actor-critic methods are widely used in offline reinforcement learningpractice, but are not so
well-understood theoretically. We propose a newoffline actor-critic algorithm that naturally …
well-understood theoretically. We propose a newoffline actor-critic algorithm that naturally …
Pessimistic bootstrapping for uncertainty-driven offline reinforcement learning
Offline Reinforcement Learning (RL) aims to learn policies from previously collected
datasets without exploring the environment. Directly applying off-policy algorithms to offline …
datasets without exploring the environment. Directly applying off-policy algorithms to offline …
Reward model ensembles help mitigate overoptimization
Reinforcement learning from human feedback (RLHF) is a standard approach for fine-tuning
large language models to follow instructions. As part of this process, learned reward models …
large language models to follow instructions. As part of this process, learned reward models …
Iterative preference learning from human feedback: Bridging theory and practice for rlhf under kl-constraint
This paper studies the theoretical framework of the alignment process of generative models
with Reinforcement Learning from Human Feedback (RLHF). We consider a standard …
with Reinforcement Learning from Human Feedback (RLHF). We consider a standard …
A policy-guided imitation approach for offline reinforcement learning
Offline reinforcement learning (RL) methods can generally be categorized into two types: RL-
based and Imitation-based. RL-based methods could in principle enjoy out-of-distribution …
based and Imitation-based. RL-based methods could in principle enjoy out-of-distribution …
Conformal prediction for uncertainty-aware planning with diffusion dynamics model
Robotic applications often involve working in environments that are uncertain, dynamic, and
partially observable. Recently, diffusion models have been proposed for learning trajectory …
partially observable. Recently, diffusion models have been proposed for learning trajectory …