Beyond average return in markov decision processes
What are the functionals of the reward that can be computed and optimized exactly in
Markov Decision Processes? In the finite-horizon, undiscounted setting, Dynamic …
Markov Decision Processes? In the finite-horizon, undiscounted setting, Dynamic …
Global reinforcement learning: Beyond linear and convex rewards via submodular semi-gradient methods
In classic Reinforcement Learning (RL), the agent maximizes an additive objective of the
visited states, eg, a value function. Unfortunately, objectives of this type cannot model many …
visited states, eg, a value function. Unfortunately, objectives of this type cannot model many …
Three dogmas of reinforcement learning
Modern reinforcement learning has been conditioned by at least three dogmas. The first is
the environment spotlight, which refers to our tendency to focus on modeling environments …
the environment spotlight, which refers to our tendency to focus on modeling environments …
[PDF][PDF] Offline reinforcement learning in large state spaces: Algorithms and guarantees
N Jiang, T Xie - Statistical Science, 2024 - nanjiang.cs.illinois.edu
This article introduces the theory of offline reinforcement learning in large state spaces,
where good policies are learned from historical data without online interactions with the …
where good policies are learned from historical data without online interactions with the …
[HTML][HTML] Optimal dynamic fixed-mix portfolios based on reinforcement learning with second order stochastic dominance
We propose a reinforcement learning (RL) approach to address a multiperiod optimization
problem in which a portfolio manager seeks an optimal constant proportion portfolio strategy …
problem in which a portfolio manager seeks an optimal constant proportion portfolio strategy …
[PDF][PDF] Generalizing Objective-Specification in Markov Decision Processes
PP Santos - Proceedings of the 23rd International Conference on …, 2024 - ifaamas.org
In this thesis, we address general utility Markov decision processes (GUMDPs), which
generalize the standard Markov decision processes (MDPs) framework for decision-making …
generalize the standard Markov decision processes (MDPs) framework for decision-making …
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
In this paper, we study reinforcement learning from human feedback (RLHF) under an
episodic Markov decision process with a general trajectory-wise reward model. We …
episodic Markov decision process with a general trajectory-wise reward model. We …
Inverse Concave-Utility Reinforcement Learning is Inverse Game Theory
MM Çelikok, FA Oliehoek, JW van de Meent - arXiv preprint arXiv …, 2024 - arxiv.org
We consider inverse reinforcement learning problems with concave utilities. Concave Utility
Reinforcement Learning (CURL) is a generalisation of the standard RL objective, which …
Reinforcement Learning (CURL) is a generalisation of the standard RL objective, which …
Beyond Expected Returns: A Policy Gradient Algorithm for Cumulative Prospect Theoretic Reinforcement Learning
The widely used expected utility theory has been shown to be empirically inconsistent with
human preferences in the psychology and behavioral economy literatures. Cumulative …
human preferences in the psychology and behavioral economy literatures. Cumulative …
Geometric active exploration in Markov decision processes: the benefit of abstraction
How can a scientist use a Reinforcement Learning (RL) algorithm to design experiments
over a dynamical system's state space? In the case of finite and Markovian systems, an area …
over a dynamical system's state space? In the case of finite and Markovian systems, an area …