- 学术资源搜索

Actor-critic policy optimization in partially observable multiagent environments

S Srinivasan, M Lanctot, V Zambaldi… - Advances in neural …, 2018 - proceedings.neurips.cc

Optimization of parameterized policies for reinforcement learning (RL) is an important and
challenging problem in artificial intelligence. Among the most common approaches are …

被引用次数：173 相关文章所有 9 个版本

[PDF] arxiv.org

Deep active inference as variational policy gradients

B Millidge - Journal of Mathematical Psychology, 2020 - Elsevier

Active Inference is a theory arising from theoretical neuroscience which casts action and
planning as Bayesian inference problems to be solved by minimizing a single quantity—the …

被引用次数：122 相关文章所有 6 个版本

[PDF] neurips.cc

Credit assignment for collective multiagent RL with global rewards

DT Nguyen, A Kumar, HC Lau - Advances in neural …, 2018 - proceedings.neurips.cc

Scaling decision theoretic planning to large multiagent systems is challenging due to
uncertainty and partial observability in the environment. We focus on a multiagent planning …

被引用次数：122 相关文章所有 10 个版本

[PDF] smu.edu.sg

Reducing estimation bias via triplet-average deep deterministic policy gradient

D Wu, X Dong, J Shen, SCH Hoi - IEEE transactions on neural …, 2020 - ieeexplore.ieee.org

The overestimation caused by function approximation is a well-known property in Q-learning
algorithms, especially in single-critic models, which leads to poor performance in practical …

被引用次数：81 相关文章所有 4 个版本

[PDF] arxiv.org

Action-depedent control variates for policy optimization via stein's identity

H Liu, Y Feng, Y Mao, D Zhou, J Peng, Q Liu - arXiv preprint arXiv …, 2017 - arxiv.org

Policy gradient methods have achieved remarkable successes in solving challenging
reinforcement learning problems. However, it still often suffers from the large variance issue …

被引用次数：99 相关文章所有 8 个版本

[PDF] iop.org Full View

A reinforcement learning approach to rare trajectory sampling

DC Rose, JF Mair, JP Garrahan - New Journal of Physics, 2021 - iopscience.iop.org

Very often when studying non-equilibrium systems one is interested in analysing dynamical
behaviour that occurs with very low probability, so called rare events. In practice, since rare …

被引用次数：66 相关文章所有 6 个版本

[PDF] mlr.press

Hindsight learning for mdps with exogenous inputs

SR Sinclair, FV Frujeri, CA Cheng… - International …, 2023 - proceedings.mlr.press

Many resource management problems require sequential decision-making under
uncertainty, where the only uncertainty affecting the decision outcomes are exogenous …

被引用次数：20 相关文章所有 7 个版本

[PDF] springer.com

Importance sampling in reinforcement learning with an estimated behavior policy

JP Hanna, S Niekum, P Stone - Machine Learning, 2021 - Springer

In reinforcement learning, importance sampling is a widely used method for evaluating an
expectation under the distribution of data of one policy when the data has in fact been …

被引用次数：37 相关文章所有 13 个版本

[PDF] aaai.org

Expected policy gradients

K Ciosek, S Whiteson - Proceedings of the AAAI Conference on …, 2018 - ojs.aaai.org

We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG)
and deterministic policy gradients (DPG) for reinforcement learning. Inspired by expected …

被引用次数：88 相关文章所有 11 个版本

[HTML] sciencedirect.com

[HTML][HTML] Actor–critic reinforcement learning and application in developing computer-vision-based interface tracking

O Dogru, K Velswamy, B Huang - Engineering, 2021 - Elsevier

This paper synchronizes control theory with computer vision by formalizing object tracking
as a sequential decision-making process. A reinforcement learning (RL) agent successfully …

被引用次数：26 相关文章所有 5 个版本