Challenging common assumptions in convex reinforcement learning

A Huang, J Chen, N Jiang - International Conference on …, 2023 - proceedings.mlr.press

MDPs with low-rank transitions—that is, the transition matrix can be factored into the product
of two matrices, left and right—is a highly representative structure that enables tractable …

被引用次数：19 相关文章所有 8 个版本

[PDF] jmlr.org

Convex reinforcement learning in finite trials

M Mutti, R De Santi, P De Bartolomeis… - Journal of Machine …, 2023 - jmlr.org

Convex Reinforcement Learning (RL) is a recently introduced framework that generalizes
the standard RL objective to any convex (or concave) function of the state distribution …

被引用次数：14 相关文章所有 5 个版本

[PDF] mlr.press

Reinforcement learning with general utilities: Simpler variance reduction and large state-action space

A Barakat, I Fatkhullin, N He - International Conference on …, 2023 - proceedings.mlr.press

We consider the reinforcement learning (RL) problem with general utilities which consists in
maximizing a function of the state-action occupancy measure. Beyond the standard …

被引用次数：16 相关文章所有 7 个版本

[PDF] mlr.press

On the limitations of Markovian rewards to express multi-objective, risk-sensitive, and modal tasks

J Skalse, A Abate - Uncertainty in Artificial Intelligence, 2023 - proceedings.mlr.press

In this paper, we study the expressivity of scalar, Markovian reward functions in
Reinforcement Learning (RL), and identify several limitations to what they can express …

被引用次数：12 相关文章所有 8 个版本

[PDF] mlr.press

A coupled flow approach to imitation learning

GJ Freund, E Sarafian, S Kraus - … Conference on Machine …, 2023 - proceedings.mlr.press

In reinforcement learning and imitation learning, an object of central importance is the state
distribution induced by the policy. It plays a crucial role in the policy gradient theorem, and …

被引用次数：13 相关文章所有 7 个版本

[PDF] arxiv.org

Submodular reinforcement learning

M Prajapat, M Mutný, MN Zeilinger… - arXiv preprint arXiv …, 2023 - arxiv.org

In reinforcement learning (RL), rewards of states are typically considered additive, and
following the Markov assumption, they are $\textit {independent} $ of states visited …

被引用次数：12 相关文章所有 5 个版本

[PDF] arxiv.org

Learning diffusion at lightspeed

A Terpin, N Lanzetti, M Gadea, F Dörfler - arXiv preprint arXiv:2406.12616, 2024 - arxiv.org

Diffusion regulates numerous natural processes and the dynamics of many successful
generative models. Existing models to learn the diffusion terms from observational data rely …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Global reinforcement learning: Beyond linear and convex rewards via submodular semi-gradient methods

R De Santi, M Prajapat, A Krause - arXiv preprint arXiv:2407.09905, 2024 - arxiv.org

In classic Reinforcement Learning (RL), the agent maximizes an additive objective of the
visited states, eg, a value function. Unfortunately, objectives of this type cannot model many …

被引用次数：3 相关文章

[PDF] arxiv.org

Three dogmas of reinforcement learning

D Abel, MK Ho, A Harutyunyan - arXiv preprint arXiv:2407.10583, 2024 - arxiv.org

Modern reinforcement learning has been conditioned by at least three dogmas. The first is
the environment spotlight, which refers to our tendency to focus on modeling environments …

被引用次数：2 相关文章所有 6 个版本

[PDF] arxiv.org

Learning Equilibria in Adversarial Team Markov Games: A Nonconvex-Hidden-Concave Min-Max Optimization Problem

F Kalogiannis, J Yan, I Panageas - arXiv preprint arXiv:2410.05673, 2024 - arxiv.org

We study the problem of learning a Nash equilibrium (NE) in Markov games which is a
cornerstone in multi-agent reinforcement learning (MARL). In particular, we focus on infinite …

被引用次数：1 相关文章所有 3 个版本