Dataset reset policy optimization for rlhf
Reinforcement Learning (RL) from Human Preference-based feedback is a popular
paradigm for fine-tuning generative models, which has produced impressive models such as …
paradigm for fine-tuning generative models, which has produced impressive models such as …
Distributionally robust model-based reinforcement learning with large state spaces
Three major challenges in reinforcement learning are the complex dynamical systems with
large state spaces, the costly data acquisition processes, and the deviation of real-world …
large state spaces, the costly data acquisition processes, and the deviation of real-world …
Minimax-optimal multi-agent RL in Markov games with a generative model
This paper studies multi-agent reinforcement learning in Markov games, with the goal of
learning Nash equilibria or coarse correlated equilibria (CCE) sample-optimally. All prior …
learning Nash equilibria or coarse correlated equilibria (CCE) sample-optimally. All prior …
Hardness of independent learning and sparse equilibrium computation in markov games
DJ Foster, N Golowich… - … Conference on Machine …, 2023 - proceedings.mlr.press
We consider the problem of decentralized multi-agent reinforcement learning in Markov
games. A fundamental question is whether there exist algorithms that, when run …
games. A fundamental question is whether there exist algorithms that, when run …
Online RL in Linearly -Realizable MDPs Is as Easy as in Linear MDPs If You Learn What to Ignore
We consider online reinforcement learning (RL) in episodic Markov decision processes
(MDPs) under the linear $ q^\pi $-realizability assumption, where it is assumed that the …
(MDPs) under the linear $ q^\pi $-realizability assumption, where it is assumed that the …
Exponential hardness of reinforcement learning with linear function approximation
A fundamental question in reinforcement learning theory is: suppose the optimal value
functions are linear in given features, can we learn them efficiently? This problem's …
functions are linear in given features, can we learn them efficiently? This problem's …
Sample-efficient reinforcement learning is feasible for linearly realizable MDPs with limited revisiting
Low-complexity models such as linear function representation play a pivotal role in enabling
sample-efficient reinforcement learning (RL). The current paper pertains to a scenario with …
sample-efficient reinforcement learning (RL). The current paper pertains to a scenario with …
Confident Approximate Policy Iteration for Efficient Local Planning in -realizable MDPs
We consider approximate dynamic programming in $\gamma $-discounted Markov decision
processes and apply it to approximate planning with linear value-function approximation …
processes and apply it to approximate planning with linear value-function approximation …
Efficient global planning in large MDPs via stochastic primal-dual optimization
We propose a new stochastic primal-dual optimization algorithm for planning in a large
discounted Markov decision process with a generative model and linear function …
discounted Markov decision process with a generative model and linear function …
Can agents run relay race with strangers? generalization of RL to out-of-distribution trajectories
In this paper, we define, evaluate, and improve the``relay-generalization''performance of
reinforcement learning (RL) agents on the out-of-distribution``controllable''states. Ideally, an …
reinforcement learning (RL) agents on the out-of-distribution``controllable''states. Ideally, an …