Large-scale Markov decision problems via the linear programming dual

R Shariff, C Szepesvári - Advances in Neural Information …, 2020 - proceedings.neurips.cc

Large-scale Markov decision processes (MDPs) require planning algorithms with runtime
independent of the number of states of the MDP. We consider the planning problem in MDPs …

被引用次数：28 相关文章所有 10 个版本

[PDF] neurips.cc

Empirical gateaux derivatives for causal inference

M Jordan, Y Wang, A Zhou - Advances in Neural …, 2022 - proceedings.neurips.cc

We study a constructive procedure that approximates Gateaux derivatives for statistical
functionals by finite-differencing, with attention to causal inference functionals. We focus on …

被引用次数：9 相关文章所有 4 个版本

[PDF] mlr.press

Robust satisficing mdps

H Ruan, S Zhou, Z Chen, CP Ho - … Conference on Machine …, 2023 - proceedings.mlr.press

Despite being a fundamental building block for reinforcement learning, Markov decision
processes (MDPs) often suffer from ambiguity in model parameters. Robust MDPs are …

被引用次数：3 相关文章所有 6 个版本

[PDF] neurips.cc

A maximum-entropy approach to off-policy evaluation in average-reward mdps

N Lazic, D Yin, M Farajtabar, N Levine… - Advances in …, 2020 - proceedings.neurips.cc

This work focuses on off-policy evaluation (OPE) with function approximation in infinite-
horizon undiscounted Markov decision processes (MDPs). For MDPs that are ergodic and …

被引用次数：11 相关文章所有 9 个版本

[PDF] neurips.cc

Large scale Markov decision processes with changing rewards

A Rivera Cardoso, H Wang… - Advances in Neural …, 2019 - proceedings.neurips.cc

Abstract We consider Markov Decision Processes (MDPs) where the rewards are unknown
and may change in an adversarial manner. We provide an algorithm that achieves a regret …

被引用次数：12 相关文章所有 7 个版本

[PDF] mlr.press

No-regret learning with high-probability in adversarial markov decision processes

M Ghasemi, A Hashemi, H Vikalo… - Uncertainty in Artificial …, 2021 - proceedings.mlr.press

In a variety of problems, a decision-maker is unaware of the loss function associated with a
task, yet it has to minimize this unknown loss in order to accomplish the task. Furthermore …

被引用次数：4 相关文章所有 6 个版本

Online learning with implicit exploration in episodic markov decision processes

M Ghasemi, A Hashemi, H Vikalo… - 2021 American Control …, 2021 - ieeexplore.ieee.org

A wide range of applications require autonomous agents that are capable of learning an a
priori unknown task. Additionally, an autonomous agent may be put in the same …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Data-Driven Influence Functions for Optimization-Based Causal Inference

MI Jordan, Y Wang, A Zhou - arXiv preprint arXiv:2208.13701, 2022 - arxiv.org

We study a constructive algorithm that approximates Gateaux derivatives for statistical
functionals by finite differencing, with a focus on functionals that arise in causal inference …

被引用次数：3 相关文章所有 2 个版本

Predictive Estimation for Reinforcement Learning with Time-Varying Reward Functions

A Hashemi, A Upadhyay - 2023 57th Asilomar Conference on …, 2023 - ieeexplore.ieee.org

The Adversarial Markov Decision Process (MDP) serves as a framework for exploring
unknown and time-varying tasks within the framework of Reinforcement Learning (RL) …

[PDF] arxiv.org

Stochastic convex optimization for provably efficient apprenticeship learning

A Kamoutsi, G Banjac, J Lygeros - arXiv preprint arXiv:2201.00039, 2021 - arxiv.org

We consider large-scale Markov decision processes (MDPs) with an unknown cost function
and employ stochastic convex optimization tools to address the problem of imitation …

被引用次数：1 相关文章所有 3 个版本