Uncertainty weighted actor-critic for offline reinforcement learning

O Lockwood, M Si - Proceedings of the AAAI Conference on Artificial …, 2022 - ojs.aaai.org

Uncertainty is ubiquitous in games, both in the agents playing games and often in the games
themselves. Working with uncertainty is therefore an important component of successful …

被引用次数：48 相关文章所有 9 个版本

[PDF] neurips.cc

Combo: Conservative offline model-based policy optimization

T Yu, A Kumar, R Rafailov… - Advances in neural …, 2021 - proceedings.neurips.cc

Abstract Model-based reinforcement learning (RL) algorithms, which learn a dynamics
model from logged experience and perform conservative planning under the learned model …

被引用次数：385 相关文章所有 7 个版本

[PDF] arxiv.org

Rvs: What is essential for offline rl via supervised learning?

S Emmons, B Eysenbach, I Kostrikov… - arXiv preprint arXiv …, 2021 - arxiv.org

Recent work has shown that supervised learning alone, without temporal difference (TD)
learning, can be remarkably effective for offline RL. When does this hold true, and which …

被引用次数：185 相关文章所有 4 个版本

[PDF] neurips.cc

Mildly conservative q-learning for offline reinforcement learning

J Lyu, X Ma, X Li, Z Lu - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Offline reinforcement learning (RL) defines the task of learning from a static logged dataset
without continually interacting with the environment. The distribution shift between the …

被引用次数：88 相关文章所有 5 个版本

[PDF] neurips.cc

Provable benefits of actor-critic methods for offline reinforcement learning

A Zanette, MJ Wainwright… - Advances in neural …, 2021 - proceedings.neurips.cc

Actor-critic methods are widely used in offline reinforcement learningpractice, but are not so
well-understood theoretically. We propose a newoffline actor-critic algorithm that naturally …

被引用次数：126 相关文章所有 8 个版本

[PDF] arxiv.org

Pessimistic bootstrapping for uncertainty-driven offline reinforcement learning

C Bai, L Wang, Z Yang, Z Deng, A Garg, P Liu… - arXiv preprint arXiv …, 2022 - arxiv.org

Offline Reinforcement Learning (RL) aims to learn policies from previously collected
datasets without exploring the environment. Directly applying off-policy algorithms to offline …

被引用次数：129 相关文章所有 5 个版本

[PDF] arxiv.org

Reward model ensembles help mitigate overoptimization

T Coste, U Anwar, R Kirk, D Krueger - arXiv preprint arXiv:2310.02743, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a standard approach for fine-tuning
large language models to follow instructions. As part of this process, learned reward models …

被引用次数：47 相关文章所有 4 个版本

[PDF] openreview.net

Iterative preference learning from human feedback: Bridging theory and practice for rlhf under kl-constraint

W Xiong, H Dong, C Ye, Z Wang, H Zhong… - … on Machine Learning, 2024 - openreview.net

This paper studies the theoretical framework of the alignment process of generative models
with Reinforcement Learning from Human Feedback (RLHF). We consider a standard …

被引用次数：27 相关文章所有 3 个版本

[PDF] neurips.cc

A policy-guided imitation approach for offline reinforcement learning

H Xu, L Jiang, L Jianxiong… - Advances in Neural …, 2022 - proceedings.neurips.cc

Offline reinforcement learning (RL) methods can generally be categorized into two types: RL-
based and Imitation-based. RL-based methods could in principle enjoy out-of-distribution …

被引用次数：46 相关文章所有 7 个版本

[PDF] neurips.cc

Conformal prediction for uncertainty-aware planning with diffusion dynamics model

J Sun, Y Jiang, J Qiu, P Nobel… - Advances in …, 2024 - proceedings.neurips.cc

Robotic applications often involve working in environments that are uncertain, dynamic, and
partially observable. Recently, diffusion models have been proposed for learning trajectory …

被引用次数：16 相关文章所有 4 个版本