Efficient planning in large MDPs with weak linear function approximation
R Shariff, C Szepesvári - Advances in Neural Information …, 2020 - proceedings.neurips.cc
Large-scale Markov decision processes (MDPs) require planning algorithms with runtime
independent of the number of states of the MDP. We consider the planning problem in MDPs …
independent of the number of states of the MDP. We consider the planning problem in MDPs …
Empirical gateaux derivatives for causal inference
We study a constructive procedure that approximates Gateaux derivatives for statistical
functionals by finite-differencing, with attention to causal inference functionals. We focus on …
functionals by finite-differencing, with attention to causal inference functionals. We focus on …
Robust satisficing mdps
Despite being a fundamental building block for reinforcement learning, Markov decision
processes (MDPs) often suffer from ambiguity in model parameters. Robust MDPs are …
processes (MDPs) often suffer from ambiguity in model parameters. Robust MDPs are …
A maximum-entropy approach to off-policy evaluation in average-reward mdps
This work focuses on off-policy evaluation (OPE) with function approximation in infinite-
horizon undiscounted Markov decision processes (MDPs). For MDPs that are ergodic and …
horizon undiscounted Markov decision processes (MDPs). For MDPs that are ergodic and …
Large scale Markov decision processes with changing rewards
A Rivera Cardoso, H Wang… - Advances in Neural …, 2019 - proceedings.neurips.cc
Abstract We consider Markov Decision Processes (MDPs) where the rewards are unknown
and may change in an adversarial manner. We provide an algorithm that achieves a regret …
and may change in an adversarial manner. We provide an algorithm that achieves a regret …
No-regret learning with high-probability in adversarial markov decision processes
In a variety of problems, a decision-maker is unaware of the loss function associated with a
task, yet it has to minimize this unknown loss in order to accomplish the task. Furthermore …
task, yet it has to minimize this unknown loss in order to accomplish the task. Furthermore …
Online learning with implicit exploration in episodic markov decision processes
A wide range of applications require autonomous agents that are capable of learning an a
priori unknown task. Additionally, an autonomous agent may be put in the same …
priori unknown task. Additionally, an autonomous agent may be put in the same …
Data-Driven Influence Functions for Optimization-Based Causal Inference
We study a constructive algorithm that approximates Gateaux derivatives for statistical
functionals by finite differencing, with a focus on functionals that arise in causal inference …
functionals by finite differencing, with a focus on functionals that arise in causal inference …
Predictive Estimation for Reinforcement Learning with Time-Varying Reward Functions
A Hashemi, A Upadhyay - 2023 57th Asilomar Conference on …, 2023 - ieeexplore.ieee.org
The Adversarial Markov Decision Process (MDP) serves as a framework for exploring
unknown and time-varying tasks within the framework of Reinforcement Learning (RL) …
unknown and time-varying tasks within the framework of Reinforcement Learning (RL) …
Stochastic convex optimization for provably efficient apprenticeship learning
We consider large-scale Markov decision processes (MDPs) with an unknown cost function
and employ stochastic convex optimization tools to address the problem of imitation …
and employ stochastic convex optimization tools to address the problem of imitation …