Beyond average return in markov decision processes

A Marthe, A Garivier, C Vernade - Advances in Neural …, 2024 - proceedings.neurips.cc
What are the functionals of the reward that can be computed and optimized exactly in
Markov Decision Processes? In the finite-horizon, undiscounted setting, Dynamic …

Global reinforcement learning: Beyond linear and convex rewards via submodular semi-gradient methods

R De Santi, M Prajapat, A Krause - arXiv preprint arXiv:2407.09905, 2024 - arxiv.org
In classic Reinforcement Learning (RL), the agent maximizes an additive objective of the
visited states, eg, a value function. Unfortunately, objectives of this type cannot model many …

Three dogmas of reinforcement learning

D Abel, MK Ho, A Harutyunyan - arXiv preprint arXiv:2407.10583, 2024 - arxiv.org
Modern reinforcement learning has been conditioned by at least three dogmas. The first is
the environment spotlight, which refers to our tendency to focus on modeling environments …

[PDF][PDF] Offline reinforcement learning in large state spaces: Algorithms and guarantees

N Jiang, T Xie - Statistical Science, 2024 - nanjiang.cs.illinois.edu
This article introduces the theory of offline reinforcement learning in large state spaces,
where good policies are learned from historical data without online interactions with the …

[HTML][HTML] Optimal dynamic fixed-mix portfolios based on reinforcement learning with second order stochastic dominance

G Consigli, AA Gomez, JP Zubelli - Engineering Applications of Artificial …, 2024 - Elsevier
We propose a reinforcement learning (RL) approach to address a multiperiod optimization
problem in which a portfolio manager seeks an optimal constant proportion portfolio strategy …

[PDF][PDF] Generalizing Objective-Specification in Markov Decision Processes

PP Santos - Proceedings of the 23rd International Conference on …, 2024 - ifaamas.org
In this thesis, we address general utility Markov decision processes (GUMDPs), which
generalize the standard Markov decision processes (MDPs) framework for decision-making …

Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis

Q Zhang, H Wei, L Ying - arXiv preprint arXiv:2406.07455, 2024 - arxiv.org
In this paper, we study reinforcement learning from human feedback (RLHF) under an
episodic Markov decision process with a general trajectory-wise reward model. We …

Inverse Concave-Utility Reinforcement Learning is Inverse Game Theory

MM Çelikok, FA Oliehoek, JW van de Meent - arXiv preprint arXiv …, 2024 - arxiv.org
We consider inverse reinforcement learning problems with concave utilities. Concave Utility
Reinforcement Learning (CURL) is a generalisation of the standard RL objective, which …

Beyond Expected Returns: A Policy Gradient Algorithm for Cumulative Prospect Theoretic Reinforcement Learning

O Lepel, A Barakat - arXiv preprint arXiv:2410.02605, 2024 - arxiv.org
The widely used expected utility theory has been shown to be empirically inconsistent with
human preferences in the psychology and behavioral economy literatures. Cumulative …

Geometric active exploration in Markov decision processes: the benefit of abstraction

R De Santi, FA Joseph, N Liniger, M Mutti… - arXiv preprint arXiv …, 2024 - arxiv.org
How can a scientist use a Reinforcement Learning (RL) algorithm to design experiments
over a dynamical system's state space? In the case of finite and Markovian systems, an area …