Actor-critic policy optimization in partially observable multiagent environments

S Srinivasan, M Lanctot, V Zambaldi… - Advances in neural …, 2018 - proceedings.neurips.cc
Optimization of parameterized policies for reinforcement learning (RL) is an important and
challenging problem in artificial intelligence. Among the most common approaches are …

Deep active inference as variational policy gradients

B Millidge - Journal of Mathematical Psychology, 2020 - Elsevier
Active Inference is a theory arising from theoretical neuroscience which casts action and
planning as Bayesian inference problems to be solved by minimizing a single quantity—the …

Credit assignment for collective multiagent RL with global rewards

DT Nguyen, A Kumar, HC Lau - Advances in neural …, 2018 - proceedings.neurips.cc
Scaling decision theoretic planning to large multiagent systems is challenging due to
uncertainty and partial observability in the environment. We focus on a multiagent planning …

Reducing estimation bias via triplet-average deep deterministic policy gradient

D Wu, X Dong, J Shen, SCH Hoi - IEEE transactions on neural …, 2020 - ieeexplore.ieee.org
The overestimation caused by function approximation is a well-known property in Q-learning
algorithms, especially in single-critic models, which leads to poor performance in practical …

Action-depedent control variates for policy optimization via stein's identity

H Liu, Y Feng, Y Mao, D Zhou, J Peng, Q Liu - arXiv preprint arXiv …, 2017 - arxiv.org
Policy gradient methods have achieved remarkable successes in solving challenging
reinforcement learning problems. However, it still often suffers from the large variance issue …

A reinforcement learning approach to rare trajectory sampling

DC Rose, JF Mair, JP Garrahan - New Journal of Physics, 2021 - iopscience.iop.org
Very often when studying non-equilibrium systems one is interested in analysing dynamical
behaviour that occurs with very low probability, so called rare events. In practice, since rare …

Hindsight learning for mdps with exogenous inputs

SR Sinclair, FV Frujeri, CA Cheng… - International …, 2023 - proceedings.mlr.press
Many resource management problems require sequential decision-making under
uncertainty, where the only uncertainty affecting the decision outcomes are exogenous …

Importance sampling in reinforcement learning with an estimated behavior policy

JP Hanna, S Niekum, P Stone - Machine Learning, 2021 - Springer
In reinforcement learning, importance sampling is a widely used method for evaluating an
expectation under the distribution of data of one policy when the data has in fact been …

Expected policy gradients

K Ciosek, S Whiteson - Proceedings of the AAAI Conference on …, 2018 - ojs.aaai.org
We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG)
and deterministic policy gradients (DPG) for reinforcement learning. Inspired by expected …

[HTML][HTML] Actor–critic reinforcement learning and application in developing computer-vision-based interface tracking

O Dogru, K Velswamy, B Huang - Engineering, 2021 - Elsevier
This paper synchronizes control theory with computer vision by formalizing object tracking
as a sequential decision-making process. A reinforcement learning (RL) agent successfully …