Reinforcement learning in low-rank mdps with density features
MDPs with low-rank transitions—that is, the transition matrix can be factored into the product
of two matrices, left and right—is a highly representative structure that enables tractable …
of two matrices, left and right—is a highly representative structure that enables tractable …
Convex reinforcement learning in finite trials
Convex Reinforcement Learning (RL) is a recently introduced framework that generalizes
the standard RL objective to any convex (or concave) function of the state distribution …
the standard RL objective to any convex (or concave) function of the state distribution …
Reinforcement learning with general utilities: Simpler variance reduction and large state-action space
We consider the reinforcement learning (RL) problem with general utilities which consists in
maximizing a function of the state-action occupancy measure. Beyond the standard …
maximizing a function of the state-action occupancy measure. Beyond the standard …
On the limitations of Markovian rewards to express multi-objective, risk-sensitive, and modal tasks
In this paper, we study the expressivity of scalar, Markovian reward functions in
Reinforcement Learning (RL), and identify several limitations to what they can express …
Reinforcement Learning (RL), and identify several limitations to what they can express …
A coupled flow approach to imitation learning
In reinforcement learning and imitation learning, an object of central importance is the state
distribution induced by the policy. It plays a crucial role in the policy gradient theorem, and …
distribution induced by the policy. It plays a crucial role in the policy gradient theorem, and …
Submodular reinforcement learning
In reinforcement learning (RL), rewards of states are typically considered additive, and
following the Markov assumption, they are $\textit {independent} $ of states visited …
following the Markov assumption, they are $\textit {independent} $ of states visited …
Learning diffusion at lightspeed
Diffusion regulates numerous natural processes and the dynamics of many successful
generative models. Existing models to learn the diffusion terms from observational data rely …
generative models. Existing models to learn the diffusion terms from observational data rely …
Global reinforcement learning: Beyond linear and convex rewards via submodular semi-gradient methods
In classic Reinforcement Learning (RL), the agent maximizes an additive objective of the
visited states, eg, a value function. Unfortunately, objectives of this type cannot model many …
visited states, eg, a value function. Unfortunately, objectives of this type cannot model many …
Three dogmas of reinforcement learning
Modern reinforcement learning has been conditioned by at least three dogmas. The first is
the environment spotlight, which refers to our tendency to focus on modeling environments …
the environment spotlight, which refers to our tendency to focus on modeling environments …
Learning Equilibria in Adversarial Team Markov Games: A Nonconvex-Hidden-Concave Min-Max Optimization Problem
F Kalogiannis, J Yan, I Panageas - arXiv preprint arXiv:2410.05673, 2024 - arxiv.org
We study the problem of learning a Nash equilibrium (NE) in Markov games which is a
cornerstone in multi-agent reinforcement learning (MARL). In particular, we focus on infinite …
cornerstone in multi-agent reinforcement learning (MARL). In particular, we focus on infinite …