Efficient planning in large MDPs with weak linear function approximation

R Shariff, C Szepesvári - Advances in Neural Information …, 2020 - proceedings.neurips.cc
Large-scale Markov decision processes (MDPs) require planning algorithms with runtime
independent of the number of states of the MDP. We consider the planning problem in MDPs …

Empirical gateaux derivatives for causal inference

M Jordan, Y Wang, A Zhou - Advances in Neural …, 2022 - proceedings.neurips.cc
We study a constructive procedure that approximates Gateaux derivatives for statistical
functionals by finite-differencing, with attention to causal inference functionals. We focus on …

Robust satisficing mdps

H Ruan, S Zhou, Z Chen, CP Ho - … Conference on Machine …, 2023 - proceedings.mlr.press
Despite being a fundamental building block for reinforcement learning, Markov decision
processes (MDPs) often suffer from ambiguity in model parameters. Robust MDPs are …

A maximum-entropy approach to off-policy evaluation in average-reward mdps

N Lazic, D Yin, M Farajtabar, N Levine… - Advances in …, 2020 - proceedings.neurips.cc
This work focuses on off-policy evaluation (OPE) with function approximation in infinite-
horizon undiscounted Markov decision processes (MDPs). For MDPs that are ergodic and …

Large scale Markov decision processes with changing rewards

A Rivera Cardoso, H Wang… - Advances in Neural …, 2019 - proceedings.neurips.cc
Abstract We consider Markov Decision Processes (MDPs) where the rewards are unknown
and may change in an adversarial manner. We provide an algorithm that achieves a regret …

No-regret learning with high-probability in adversarial markov decision processes

M Ghasemi, A Hashemi, H Vikalo… - Uncertainty in Artificial …, 2021 - proceedings.mlr.press
In a variety of problems, a decision-maker is unaware of the loss function associated with a
task, yet it has to minimize this unknown loss in order to accomplish the task. Furthermore …

Online learning with implicit exploration in episodic markov decision processes

M Ghasemi, A Hashemi, H Vikalo… - 2021 American Control …, 2021 - ieeexplore.ieee.org
A wide range of applications require autonomous agents that are capable of learning an a
priori unknown task. Additionally, an autonomous agent may be put in the same …

Data-Driven Influence Functions for Optimization-Based Causal Inference

MI Jordan, Y Wang, A Zhou - arXiv preprint arXiv:2208.13701, 2022 - arxiv.org
We study a constructive algorithm that approximates Gateaux derivatives for statistical
functionals by finite differencing, with a focus on functionals that arise in causal inference …

Predictive Estimation for Reinforcement Learning with Time-Varying Reward Functions

A Hashemi, A Upadhyay - 2023 57th Asilomar Conference on …, 2023 - ieeexplore.ieee.org
The Adversarial Markov Decision Process (MDP) serves as a framework for exploring
unknown and time-varying tasks within the framework of Reinforcement Learning (RL) …

Stochastic convex optimization for provably efficient apprenticeship learning

A Kamoutsi, G Banjac, J Lygeros - arXiv preprint arXiv:2201.00039, 2021 - arxiv.org
We consider large-scale Markov decision processes (MDPs) with an unknown cost function
and employ stochastic convex optimization tools to address the problem of imitation …