Actor-critic policy optimization in partially observable multiagent environments
Optimization of parameterized policies for reinforcement learning (RL) is an important and
challenging problem in artificial intelligence. Among the most common approaches are …
challenging problem in artificial intelligence. Among the most common approaches are …
Deep active inference as variational policy gradients
B Millidge - Journal of Mathematical Psychology, 2020 - Elsevier
Active Inference is a theory arising from theoretical neuroscience which casts action and
planning as Bayesian inference problems to be solved by minimizing a single quantity—the …
planning as Bayesian inference problems to be solved by minimizing a single quantity—the …
Credit assignment for collective multiagent RL with global rewards
Scaling decision theoretic planning to large multiagent systems is challenging due to
uncertainty and partial observability in the environment. We focus on a multiagent planning …
uncertainty and partial observability in the environment. We focus on a multiagent planning …
Reducing estimation bias via triplet-average deep deterministic policy gradient
The overestimation caused by function approximation is a well-known property in Q-learning
algorithms, especially in single-critic models, which leads to poor performance in practical …
algorithms, especially in single-critic models, which leads to poor performance in practical …
Action-depedent control variates for policy optimization via stein's identity
Policy gradient methods have achieved remarkable successes in solving challenging
reinforcement learning problems. However, it still often suffers from the large variance issue …
reinforcement learning problems. However, it still often suffers from the large variance issue …
A reinforcement learning approach to rare trajectory sampling
Very often when studying non-equilibrium systems one is interested in analysing dynamical
behaviour that occurs with very low probability, so called rare events. In practice, since rare …
behaviour that occurs with very low probability, so called rare events. In practice, since rare …
Hindsight learning for mdps with exogenous inputs
Many resource management problems require sequential decision-making under
uncertainty, where the only uncertainty affecting the decision outcomes are exogenous …
uncertainty, where the only uncertainty affecting the decision outcomes are exogenous …
Importance sampling in reinforcement learning with an estimated behavior policy
In reinforcement learning, importance sampling is a widely used method for evaluating an
expectation under the distribution of data of one policy when the data has in fact been …
expectation under the distribution of data of one policy when the data has in fact been …
Expected policy gradients
K Ciosek, S Whiteson - Proceedings of the AAAI Conference on …, 2018 - ojs.aaai.org
We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG)
and deterministic policy gradients (DPG) for reinforcement learning. Inspired by expected …
and deterministic policy gradients (DPG) for reinforcement learning. Inspired by expected …
[HTML][HTML] Actor–critic reinforcement learning and application in developing computer-vision-based interface tracking
This paper synchronizes control theory with computer vision by formalizing object tracking
as a sequential decision-making process. A reinforcement learning (RL) agent successfully …
as a sequential decision-making process. A reinforcement learning (RL) agent successfully …