Geometry and determinism of optimal stationary control in partially observable markov decision...

D Ghosh, J Rahme, A Kumar, A Zhang… - Advances in neural …, 2021 - proceedings.neurips.cc

Generalization is a central challenge for the deployment of reinforcement learning (RL)
systems in the real world. In this paper, we show that the sequential structure of the RL …

被引用次数：107 相关文章所有 10 个版本

[PDF] jmlr.org

F2a2: Flexible fully-decentralized approximate actor-critic for cooperative multi-agent reinforcement learning

W Li, B Jin, X Wang, J Yan, H Zha - Journal of Machine Learning Research, 2023 - jmlr.org

Traditional centralized multi-agent reinforcement learning (MARL) algorithms are sometimes
unpractical in complicated applications due to non-interactivity between agents, the curse of …

被引用次数：24 相关文章所有 3 个版本

[PDF] arxiv.org

The geometry of memoryless stochastic policy optimization in infinite-horizon POMDPs

J Müller, G Montúfar - arXiv preprint arXiv:2110.07409, 2021 - arxiv.org

We consider the problem of finding the best memoryless stochastic policy for an infinite-
horizon partially observable Markov decision process (POMDP) with finite state and action …

被引用次数：11 相关文章所有 5 个版本

[PDF] arxiv.org

Policy gradient in partially observable environments: Approximation and convergence

K Azizzadenesheli, Y Yue, A Anandkumar - arXiv preprint arXiv …, 2018 - arxiv.org

Policy gradient is a generic and flexible reinforcement learning approach that generally
enjoys simplicity in analysis, implementation, and deployment. In the last few decades, this …

被引用次数：27 相关文章所有 2 个版本

[PDF] researchgate.net

Active Inference and Reinforcement Learning: A Unified Inference on Continuous State and Action Spaces under Partial Observability

P Malekzadeh, KN Plataniotis - Neural Computation, 2024 - direct.mit.edu

Reinforcement learning (RL) has garnered significant attention for developing decision-
making agents that aim to maximize rewards, specified by an external supervisor, within fully …

被引用次数：1 相关文章所有 2 个版本

Geometry of Optimization in Markov Decision Processes and Neural Network-Based PDE Solvers

J Müller - 2023 - ul.qucosa.de

Abstract (EN) This thesis is divided into two parts dealing with the optimization problems in
Markov decision processes (MDPs) and different neural network-based numerical solvers …

被引用次数：2 相关文章

[PDF] mlr.press

Open problem: Approximate planning of pomdps in the class of memoryless policies

K Azizzadenesheli, A Lazaric… - … on Learning Theory, 2016 - proceedings.mlr.press

Planning plays an important role in the broad class of decision theory. Planning has drawn
much attention in recent work in the robotics and sequential decision making areas …

被引用次数：13 相关文章所有 14 个版本

[PDF] arxiv.org

Algebraic optimization of sequential decision problems

M Dressler, M Garrote-López, G Montúfar… - arXiv preprint arXiv …, 2022 - arxiv.org

We study the optimization of the expected long-term reward in finite partially observable
Markov decision processes over the set of stationary stochastic policies. In the case of …

被引用次数：1 相关文章所有 6 个版本

[PDF] arxiv.org

Learning Algorithms for Intelligent Agents and Mechanisms

J Rahme - 2022 - search.proquest.com

The ability to learn from past experiences and adapt one's behavior accordingly within an
environment or context to achieve a certain goal is a characteristic of a truly intelligent entity …

被引用次数：1 相关文章所有 5 个版本

[PDF] arxiv.org

Combining information-seeking exploration and reward maximization: Unified inference on continuous state and action spaces under partial observability

P Malekzadeh, KN Plataniotis - arXiv preprint arXiv:2212.07946, 2022 - arxiv.org

Reinforcement learning (RL) gained considerable attention by creating decision-making
agents that maximize rewards received from fully observable environments. However, many …