Weighted importance sampling for off-policy learning with linear function approximation

O Nachum, Y Chow, B Dai, L Li - Advances in neural …, 2019 - proceedings.neurips.cc

In many real-world reinforcement learning applications, access to the environment is limited
to a fixed dataset, instead of direct (online) interaction with the environment. When using this …

被引用次数：373 相关文章所有 9 个版本

[PDF] arxiv.org

Deep reinforcement learning

SE Li - Reinforcement learning for sequential decision and …, 2023 - Springer

Similar to humans, RL agents use interactive learning to successfully obtain satisfactory
decision strategies. However, in many cases, it is desirable to learn directly from …

被引用次数：424 相关文章所有 9 个版本

[PDF] mlr.press

Data-efficient off-policy policy evaluation for reinforcement learning

P Thomas, E Brunskill - International Conference on …, 2016 - proceedings.mlr.press

In this paper we present a new way of predicting the performance of a reinforcement
learning policy given historical data that may have been generated by a different policy. The …

被引用次数：766 相关文章所有 14 个版本

[PDF] arxiv.org

Q-prop: Sample-efficient policy gradient with an off-policy critic

S Gu, T Lillicrap, Z Ghahramani, RE Turner… - arXiv preprint arXiv …, 2016 - arxiv.org

Model-free deep reinforcement learning (RL) methods have been successful in a wide
variety of simulated domains. However, a major obstacle facing deep RL in the real world is …

被引用次数：415 相关文章所有 10 个版本

[PDF] arxiv.org

Deep reinforcement learning based dynamic trajectory control for UAV-assisted mobile edge computing

L Wang, K Wang, C Pan, W Xu, N Aslam… - IEEE Transactions …, 2021 - ieeexplore.ieee.org

In this paper, we consider a platform of flying mobile edge computing (F-MEC), where
unmanned aerial vehicles (UAVs) serve as equipment providing computation resource, and …

被引用次数：238 相关文章所有 8 个版本

[PDF] neurips.cc

Learning implicit credit assignment for cooperative multi-agent reinforcement learning

M Zhou, Z Liu, P Sui, Y Li… - Advances in neural …, 2020 - proceedings.neurips.cc

We present a multi-agent actor-critic method that aims to implicitly address the credit
assignment problem under fully cooperative settings. Our key motivation is that credit …

被引用次数：145 相关文章所有 7 个版本

[PDF] mlr.press

More robust doubly robust off-policy evaluation

M Farajtabar, Y Chow… - … on Machine Learning, 2018 - proceedings.mlr.press

We study the problem of off-policy evaluation (OPE) in reinforcement learning (RL), where
the goal is to estimate the performance of a policy from the data generated by another policy …

被引用次数：289 相关文章所有 7 个版本

[PDF] jmlr.org

Double reinforcement learning for efficient off-policy evaluation in markov decision processes

N Kallus, M Uehara - Journal of Machine Learning Research, 2020 - jmlr.org

Off-policy evaluation (OPE) in reinforcement learning allows one to evaluate novel decision
policies without needing to conduct exploration, which is often costly or otherwise infeasible …

被引用次数：208 相关文章所有 7 个版本

[PDF] researchgate.net

A novel DDPG method with prioritized experience replay

Y Hou, L Liu, Q Wei, X Xu… - 2017 IEEE international …, 2017 - ieeexplore.ieee.org

Recently, a state-of-the-art algorithm, called deep deterministic policy gradient (DDPG), has
achieved good performance in many continuous control tasks in the MuJoCo simulator. To …

被引用次数：268 相关文章所有 5 个版本

[PDF] lins-cqupt.cn

Multi-agent deep reinforcement learning based UAV trajectory optimization for differentiated services

Z Ning, Y Yang, X Wang, Q Song, L Guo… - IEEE Transactions …, 2023 - ieeexplore.ieee.org

Driven by the increasing computational demand of real-time mobile applications, Unmanned
Aerial Vehicle (UAV) assisted Multi-access Edge Computing (MEC) has been envisioned as …

被引用次数：45 相关文章所有 4 个版本