Reinforcement learning in low-rank mdps with density features

A Huang, J Chen, N Jiang - International Conference on …, 2023 - proceedings.mlr.press
MDPs with low-rank transitions—that is, the transition matrix can be factored into the product
of two matrices, left and right—is a highly representative structure that enables tractable …

Convex reinforcement learning in finite trials

M Mutti, R De Santi, P De Bartolomeis… - Journal of Machine …, 2023 - jmlr.org
Convex Reinforcement Learning (RL) is a recently introduced framework that generalizes
the standard RL objective to any convex (or concave) function of the state distribution …

Reinforcement learning with general utilities: Simpler variance reduction and large state-action space

A Barakat, I Fatkhullin, N He - International Conference on …, 2023 - proceedings.mlr.press
We consider the reinforcement learning (RL) problem with general utilities which consists in
maximizing a function of the state-action occupancy measure. Beyond the standard …

On the limitations of Markovian rewards to express multi-objective, risk-sensitive, and modal tasks

J Skalse, A Abate - Uncertainty in Artificial Intelligence, 2023 - proceedings.mlr.press
In this paper, we study the expressivity of scalar, Markovian reward functions in
Reinforcement Learning (RL), and identify several limitations to what they can express …

A coupled flow approach to imitation learning

GJ Freund, E Sarafian, S Kraus - … Conference on Machine …, 2023 - proceedings.mlr.press
In reinforcement learning and imitation learning, an object of central importance is the state
distribution induced by the policy. It plays a crucial role in the policy gradient theorem, and …

Submodular reinforcement learning

M Prajapat, M Mutný, MN Zeilinger… - arXiv preprint arXiv …, 2023 - arxiv.org
In reinforcement learning (RL), rewards of states are typically considered additive, and
following the Markov assumption, they are $\textit {independent} $ of states visited …

Learning diffusion at lightspeed

A Terpin, N Lanzetti, M Gadea, F Dörfler - arXiv preprint arXiv:2406.12616, 2024 - arxiv.org
Diffusion regulates numerous natural processes and the dynamics of many successful
generative models. Existing models to learn the diffusion terms from observational data rely …

Global reinforcement learning: Beyond linear and convex rewards via submodular semi-gradient methods

R De Santi, M Prajapat, A Krause - arXiv preprint arXiv:2407.09905, 2024 - arxiv.org
In classic Reinforcement Learning (RL), the agent maximizes an additive objective of the
visited states, eg, a value function. Unfortunately, objectives of this type cannot model many …

Three dogmas of reinforcement learning

D Abel, MK Ho, A Harutyunyan - arXiv preprint arXiv:2407.10583, 2024 - arxiv.org
Modern reinforcement learning has been conditioned by at least three dogmas. The first is
the environment spotlight, which refers to our tendency to focus on modeling environments …

Learning Equilibria in Adversarial Team Markov Games: A Nonconvex-Hidden-Concave Min-Max Optimization Problem

F Kalogiannis, J Yan, I Panageas - arXiv preprint arXiv:2410.05673, 2024 - arxiv.org
We study the problem of learning a Nash equilibrium (NE) in Markov games which is a
cornerstone in multi-agent reinforcement learning (MARL). In particular, we focus on infinite …