Dataset reset policy optimization for rlhf

JD Chang, W Zhan, O Oertell, K Brantley… - arXiv preprint arXiv …, 2024 - arxiv.org
Reinforcement Learning (RL) from Human Preference-based feedback is a popular
paradigm for fine-tuning generative models, which has produced impressive models such as …

Distributionally robust model-based reinforcement learning with large state spaces

SS Ramesh, PG Sessa, Y Hu… - International …, 2024 - proceedings.mlr.press
Three major challenges in reinforcement learning are the complex dynamical systems with
large state spaces, the costly data acquisition processes, and the deviation of real-world …

Minimax-optimal multi-agent RL in Markov games with a generative model

G Li, Y Chi, Y Wei, Y Chen - Advances in Neural …, 2022 - proceedings.neurips.cc
This paper studies multi-agent reinforcement learning in Markov games, with the goal of
learning Nash equilibria or coarse correlated equilibria (CCE) sample-optimally. All prior …

Hardness of independent learning and sparse equilibrium computation in markov games

DJ Foster, N Golowich… - … Conference on Machine …, 2023 - proceedings.mlr.press
We consider the problem of decentralized multi-agent reinforcement learning in Markov
games. A fundamental question is whether there exist algorithms that, when run …

Online RL in Linearly -Realizable MDPs Is as Easy as in Linear MDPs If You Learn What to Ignore

G Weisz, A György… - Advances in Neural …, 2024 - proceedings.neurips.cc
We consider online reinforcement learning (RL) in episodic Markov decision processes
(MDPs) under the linear $ q^\pi $-realizability assumption, where it is assumed that the …

Exponential hardness of reinforcement learning with linear function approximation

S Liu, G Mahajan, D Kane, S Lovett… - The Thirty Sixth …, 2023 - proceedings.mlr.press
A fundamental question in reinforcement learning theory is: suppose the optimal value
functions are linear in given features, can we learn them efficiently? This problem's …

Sample-efficient reinforcement learning is feasible for linearly realizable MDPs with limited revisiting

G Li, Y Chen, Y Chi, Y Gu… - Advances in Neural …, 2021 - proceedings.neurips.cc
Low-complexity models such as linear function representation play a pivotal role in enabling
sample-efficient reinforcement learning (RL). The current paper pertains to a scenario with …

Confident Approximate Policy Iteration for Efficient Local Planning in -realizable MDPs

G Weisz, A György, T Kozuno… - Advances in Neural …, 2022 - proceedings.neurips.cc
We consider approximate dynamic programming in $\gamma $-discounted Markov decision
processes and apply it to approximate planning with linear value-function approximation …

Efficient global planning in large MDPs via stochastic primal-dual optimization

G Neu, N Okolo - International Conference on Algorithmic …, 2023 - proceedings.mlr.press
We propose a new stochastic primal-dual optimization algorithm for planning in a large
discounted Markov decision process with a generative model and linear function …

Can agents run relay race with strangers? generalization of RL to out-of-distribution trajectories

LC Lan, H Zhang, CJ Hsieh - arXiv preprint arXiv:2304.13424, 2023 - arxiv.org
In this paper, we define, evaluate, and improve the``relay-generalization''performance of
reinforcement learning (RL) agents on the out-of-distribution``controllable''states. Ideally, an …