Dualdice: Behavior-agnostic estimation of discounted stationary distribution corrections

O Nachum, Y Chow, B Dai, L Li - Advances in neural …, 2019 - proceedings.neurips.cc
In many real-world reinforcement learning applications, access to the environment is limited
to a fixed dataset, instead of direct (online) interaction with the environment. When using this …

Deep reinforcement learning

SE Li - Reinforcement learning for sequential decision and …, 2023 - Springer
Similar to humans, RL agents use interactive learning to successfully obtain satisfactory
decision strategies. However, in many cases, it is desirable to learn directly from …

Data-efficient off-policy policy evaluation for reinforcement learning

P Thomas, E Brunskill - International Conference on …, 2016 - proceedings.mlr.press
In this paper we present a new way of predicting the performance of a reinforcement
learning policy given historical data that may have been generated by a different policy. The …

Q-prop: Sample-efficient policy gradient with an off-policy critic

S Gu, T Lillicrap, Z Ghahramani, RE Turner… - arXiv preprint arXiv …, 2016 - arxiv.org
Model-free deep reinforcement learning (RL) methods have been successful in a wide
variety of simulated domains. However, a major obstacle facing deep RL in the real world is …

Deep reinforcement learning based dynamic trajectory control for UAV-assisted mobile edge computing

L Wang, K Wang, C Pan, W Xu, N Aslam… - IEEE Transactions …, 2021 - ieeexplore.ieee.org
In this paper, we consider a platform of flying mobile edge computing (F-MEC), where
unmanned aerial vehicles (UAVs) serve as equipment providing computation resource, and …

Learning implicit credit assignment for cooperative multi-agent reinforcement learning

M Zhou, Z Liu, P Sui, Y Li… - Advances in neural …, 2020 - proceedings.neurips.cc
We present a multi-agent actor-critic method that aims to implicitly address the credit
assignment problem under fully cooperative settings. Our key motivation is that credit …

More robust doubly robust off-policy evaluation

M Farajtabar, Y Chow… - … on Machine Learning, 2018 - proceedings.mlr.press
We study the problem of off-policy evaluation (OPE) in reinforcement learning (RL), where
the goal is to estimate the performance of a policy from the data generated by another policy …

Double reinforcement learning for efficient off-policy evaluation in markov decision processes

N Kallus, M Uehara - Journal of Machine Learning Research, 2020 - jmlr.org
Off-policy evaluation (OPE) in reinforcement learning allows one to evaluate novel decision
policies without needing to conduct exploration, which is often costly or otherwise infeasible …

A novel DDPG method with prioritized experience replay

Y Hou, L Liu, Q Wei, X Xu… - 2017 IEEE international …, 2017 - ieeexplore.ieee.org
Recently, a state-of-the-art algorithm, called deep deterministic policy gradient (DDPG), has
achieved good performance in many continuous control tasks in the MuJoCo simulator. To …

Multi-agent deep reinforcement learning based UAV trajectory optimization for differentiated services

Z Ning, Y Yang, X Wang, Q Song, L Guo… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
Driven by the increasing computational demand of real-time mobile applications, Unmanned
Aerial Vehicle (UAV) assisted Multi-access Edge Computing (MEC) has been envisioned as …