Convex reinforcement learning in finite trials

A Marthe, A Garivier, C Vernade - Advances in Neural …, 2024 - proceedings.neurips.cc

What are the functionals of the reward that can be computed and optimized exactly in
Markov Decision Processes? In the finite-horizon, undiscounted setting, Dynamic …

被引用次数：8 相关文章所有 10 个版本

[PDF] arxiv.org

Global reinforcement learning: Beyond linear and convex rewards via submodular semi-gradient methods

R De Santi, M Prajapat, A Krause - arXiv preprint arXiv:2407.09905, 2024 - arxiv.org

In classic Reinforcement Learning (RL), the agent maximizes an additive objective of the
visited states, eg, a value function. Unfortunately, objectives of this type cannot model many …

被引用次数：3 相关文章

[PDF] arxiv.org

Three dogmas of reinforcement learning

D Abel, MK Ho, A Harutyunyan - arXiv preprint arXiv:2407.10583, 2024 - arxiv.org

Modern reinforcement learning has been conditioned by at least three dogmas. The first is
the environment spotlight, which refers to our tendency to focus on modeling environments …

被引用次数：2 相关文章所有 6 个版本

[PDF] illinois.edu

[PDF][PDF] Offline reinforcement learning in large state spaces: Algorithms and guarantees

N Jiang, T Xie - Statistical Science, 2024 - nanjiang.cs.illinois.edu

This article introduces the theory of offline reinforcement learning in large state spaces,
where good policies are learned from historical data without online interactions with the …

被引用次数：1 相关文章所有 3 个版本

[HTML] sciencedirect.com

[HTML][HTML] Optimal dynamic fixed-mix portfolios based on reinforcement learning with second order stochastic dominance

G Consigli, AA Gomez, JP Zubelli - Engineering Applications of Artificial …, 2024 - Elsevier

We propose a reinforcement learning (RL) approach to address a multiperiod optimization
problem in which a portfolio manager seeks an optimal constant proportion portfolio strategy …

被引用次数：2 相关文章

[PDF] ifaamas.org

[PDF][PDF] Generalizing Objective-Specification in Markov Decision Processes

PP Santos - Proceedings of the 23rd International Conference on …, 2024 - ifaamas.org

In this thesis, we address general utility Markov decision processes (GUMDPs), which
generalize the standard Markov decision processes (MDPs) framework for decision-making …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis

Q Zhang, H Wei, L Ying - arXiv preprint arXiv:2406.07455, 2024 - arxiv.org

In this paper, we study reinforcement learning from human feedback (RLHF) under an
episodic Markov decision process with a general trajectory-wise reward model. We …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Inverse Concave-Utility Reinforcement Learning is Inverse Game Theory

MM Çelikok, FA Oliehoek, JW van de Meent - arXiv preprint arXiv …, 2024 - arxiv.org

We consider inverse reinforcement learning problems with concave utilities. Concave Utility
Reinforcement Learning (CURL) is a generalisation of the standard RL objective, which …

[PDF] arxiv.org

Beyond Expected Returns: A Policy Gradient Algorithm for Cumulative Prospect Theoretic Reinforcement Learning

O Lepel, A Barakat - arXiv preprint arXiv:2410.02605, 2024 - arxiv.org

The widely used expected utility theory has been shown to be empirically inconsistent with
human preferences in the psychology and behavioral economy literatures. Cumulative …

Geometric active exploration in Markov decision processes: the benefit of abstraction

R De Santi, FA Joseph, N Liniger, M Mutti… - arXiv preprint arXiv …, 2024 - arxiv.org

How can a scientist use a Reinforcement Learning (RL) algorithm to design experiments
over a dynamical system's state space? In the case of finite and Markovian systems, an area …