Optimizing quantiles in preference-based Markov decision processes

T Kaufmann, P Weng, V Bengs… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning
(RL) that learns from human feedback instead of relying on an engineered reward function …

被引用次数：75 相关文章所有 4 个版本

[PDF] enseeiht.fr

[图书][B] Distributional reinforcement learning

MG Bellemare, W Dabney, M Rowland - 2023 - books.google.com

The first comprehensive guide to distributional reinforcement learning, providing a new
mathematical formalism for thinking about decisions from a probabilistic perspective …

被引用次数：140 相关文章所有 9 个版本

Dynamic programming models for maximizing customer lifetime value: an overview

E AboElHamd, HM Shamma, M Saleh - Intelligent Systems and …, 2020 - Springer

Customer lifetime value (CLV) is the most reliable indicator in direct marketing for measuring
the profitability of the customers. This motivated the researchers to compete in building …

被引用次数：14 相关文章所有 3 个版本

[HTML] nih.gov

Quantile Markov decision processes

X Li, H Zhong, ML Brandeau - Operations research, 2022 - pubsonline.informs.org

The goal of a traditional Markov decision process (MDP) is to maximize expected cumulative
reward over a defined horizon (possibly infinite). In many applications, however, a decision …

被引用次数：14 相关文章所有 9 个版本

[PDF] arxiv.org

Computational approaches for stochastic shortest path on succinct MDPs

K Chatterjee, H Fu, AK Goharshady, N Okati - arXiv preprint arXiv …, 2018 - arxiv.org

We consider the stochastic shortest path (SSP) problem for succinct Markov decision
processes (MDPs), where the MDP consists of a set of variables, and a set of …

被引用次数：23 相关文章所有 7 个版本

[PDF] arxiv.org

Conditional value-at-risk for reachability and mean payoff in Markov decision processes

J Křetínský, T Meggendorfer - Proceedings of the 33rd Annual ACM …, 2018 - dl.acm.org

We present the conditional value-at-risk (CVaR) in the context of Markov chains and Markov
decision processes with reachability and mean-payoff objectives. CVaR quantifies risk by …

被引用次数：18 相关文章所有 7 个版本

[PDF] arxiv.org

Risk-averse MDPs under reward ambiguity

H Ruan, Z Chen, CP Ho - arXiv preprint arXiv:2301.01045, 2023 - arxiv.org

We propose a distributionally robust return-risk model for Markov decision processes
(MDPs) under risk and reward ambiguity. The proposed model optimizes the weighted …

被引用次数：3 相关文章所有 2 个版本

[PDF] academia.edu

[PDF][PDF] Maximizing customer lifetime value using dynamic programming: Theoretical and practical implications

E AboElHamd, HM Shamma, M Saleh - Academy of Marketing …, 2020 - academia.edu

Dynamic programming models play a significant role in maximizing customer lifetime value
(CLV), in different market types including B2B, B2C, C2B, C2C and B2B2C. This paper …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

Learning Risk Preferences in Markov Decision Processes: an Application to the Fourth Down Decision in Football

N Sandholtz, L Wu, M Puterman, TCY Chan - arXiv preprint arXiv …, 2023 - arxiv.org

For decades, National Football League (NFL) coaches' observed fourth down decisions
have been largely inconsistent with prescriptions based on statistical models. In this paper …

Verification of Discrete-Time Markov Decision Processes

T Meggendorfer - 2021 - mediatum.ub.tum.de

In this thesis, we discuss the verification of discrete-time Markov decision processes (MDP).
First, we present two novel algorithms to efficiently compute mean-payoff queries on MDP …

被引用次数：3 相关文章所有 3 个版本