Why generalization in rl is difficult: Epistemic pomdps and implicit partial observability

D Ghosh, J Rahme, A Kumar, A Zhang… - Advances in neural …, 2021 - proceedings.neurips.cc
Generalization is a central challenge for the deployment of reinforcement learning (RL)
systems in the real world. In this paper, we show that the sequential structure of the RL …

F2a2: Flexible fully-decentralized approximate actor-critic for cooperative multi-agent reinforcement learning

W Li, B Jin, X Wang, J Yan, H Zha - Journal of Machine Learning Research, 2023 - jmlr.org
Traditional centralized multi-agent reinforcement learning (MARL) algorithms are sometimes
unpractical in complicated applications due to non-interactivity between agents, the curse of …

The geometry of memoryless stochastic policy optimization in infinite-horizon POMDPs

J Müller, G Montúfar - arXiv preprint arXiv:2110.07409, 2021 - arxiv.org
We consider the problem of finding the best memoryless stochastic policy for an infinite-
horizon partially observable Markov decision process (POMDP) with finite state and action …

Policy gradient in partially observable environments: Approximation and convergence

K Azizzadenesheli, Y Yue, A Anandkumar - arXiv preprint arXiv …, 2018 - arxiv.org
Policy gradient is a generic and flexible reinforcement learning approach that generally
enjoys simplicity in analysis, implementation, and deployment. In the last few decades, this …

Active Inference and Reinforcement Learning: A Unified Inference on Continuous State and Action Spaces under Partial Observability

P Malekzadeh, KN Plataniotis - Neural Computation, 2024 - direct.mit.edu
Reinforcement learning (RL) has garnered significant attention for developing decision-
making agents that aim to maximize rewards, specified by an external supervisor, within fully …

Geometry of Optimization in Markov Decision Processes and Neural Network-Based PDE Solvers

J Müller - 2023 - ul.qucosa.de
Abstract (EN) This thesis is divided into two parts dealing with the optimization problems in
Markov decision processes (MDPs) and different neural network-based numerical solvers …

Open problem: Approximate planning of pomdps in the class of memoryless policies

K Azizzadenesheli, A Lazaric… - … on Learning Theory, 2016 - proceedings.mlr.press
Planning plays an important role in the broad class of decision theory. Planning has drawn
much attention in recent work in the robotics and sequential decision making areas …

Algebraic optimization of sequential decision problems

M Dressler, M Garrote-López, G Montúfar… - arXiv preprint arXiv …, 2022 - arxiv.org
We study the optimization of the expected long-term reward in finite partially observable
Markov decision processes over the set of stationary stochastic policies. In the case of …

Learning Algorithms for Intelligent Agents and Mechanisms

J Rahme - 2022 - search.proquest.com
The ability to learn from past experiences and adapt one's behavior accordingly within an
environment or context to achieve a certain goal is a characteristic of a truly intelligent entity …

Combining information-seeking exploration and reward maximization: Unified inference on continuous state and action spaces under partial observability

P Malekzadeh, KN Plataniotis - arXiv preprint arXiv:2212.07946, 2022 - arxiv.org
Reinforcement learning (RL) gained considerable attention by creating decision-making
agents that maximize rewards received from fully observable environments. However, many …