Decentralized Stochastic Control in Standard Borel Spaces: Centralized MDP Reductions, Near Optimality of Finite Window Local Information, and Q-Learning

O Mrani-Zentar, S Yüksel - arXiv preprint arXiv:2408.13828, 2024 - arxiv.org
Decentralized stochastic control problems are intrinsically difficult to study because of the
inapplicability of standard tools from centralized control such as dynamic programming and …

Structural Results and Applications for Perturbed Markov Chains

D Vial - 2020 - deepblue.lib.umich.edu
Each day, most of us interact with a myriad of networks: we search for information on the
web, connect with friends on social media platforms, and power our homes using the …

Empirical policy iteration for approximate dynamic programming

WB Haskell, R Jain, D Kalathil - 53rd IEEE Conference on …, 2014 - ieeexplore.ieee.org
We propose a simulation based algorithm, Empirical Policy Iteration (EPI) algorithm, for
finding the optimal policy function of an MDP with infinite horizon discounted cost criteria …

[图书][B] Finite state approximation for a class of POMDPs and a comparison of reinforcement learning algorithms for energy storage management of renewable …

M Mannan - 2014 - search.proquest.com
This thesis consists of two parts. In the first part, we investigate numerical solution of Partially
observable Markov decision processes (POMDPs). POMDP is a modelling technique which …