Dualdice: Behavior-agnostic estimation of discounted stationary distribution corrections
In many real-world reinforcement learning applications, access to the environment is limited
to a fixed dataset, instead of direct (online) interaction with the environment. When using this …
to a fixed dataset, instead of direct (online) interaction with the environment. When using this …
Deep reinforcement learning
SE Li - Reinforcement learning for sequential decision and …, 2023 - Springer
Similar to humans, RL agents use interactive learning to successfully obtain satisfactory
decision strategies. However, in many cases, it is desirable to learn directly from …
decision strategies. However, in many cases, it is desirable to learn directly from …
Data-efficient off-policy policy evaluation for reinforcement learning
P Thomas, E Brunskill - International Conference on …, 2016 - proceedings.mlr.press
In this paper we present a new way of predicting the performance of a reinforcement
learning policy given historical data that may have been generated by a different policy. The …
learning policy given historical data that may have been generated by a different policy. The …
Q-prop: Sample-efficient policy gradient with an off-policy critic
Model-free deep reinforcement learning (RL) methods have been successful in a wide
variety of simulated domains. However, a major obstacle facing deep RL in the real world is …
variety of simulated domains. However, a major obstacle facing deep RL in the real world is …
Deep reinforcement learning based dynamic trajectory control for UAV-assisted mobile edge computing
In this paper, we consider a platform of flying mobile edge computing (F-MEC), where
unmanned aerial vehicles (UAVs) serve as equipment providing computation resource, and …
unmanned aerial vehicles (UAVs) serve as equipment providing computation resource, and …
Learning implicit credit assignment for cooperative multi-agent reinforcement learning
We present a multi-agent actor-critic method that aims to implicitly address the credit
assignment problem under fully cooperative settings. Our key motivation is that credit …
assignment problem under fully cooperative settings. Our key motivation is that credit …
More robust doubly robust off-policy evaluation
M Farajtabar, Y Chow… - … on Machine Learning, 2018 - proceedings.mlr.press
We study the problem of off-policy evaluation (OPE) in reinforcement learning (RL), where
the goal is to estimate the performance of a policy from the data generated by another policy …
the goal is to estimate the performance of a policy from the data generated by another policy …
Double reinforcement learning for efficient off-policy evaluation in markov decision processes
Off-policy evaluation (OPE) in reinforcement learning allows one to evaluate novel decision
policies without needing to conduct exploration, which is often costly or otherwise infeasible …
policies without needing to conduct exploration, which is often costly or otherwise infeasible …
A novel DDPG method with prioritized experience replay
Recently, a state-of-the-art algorithm, called deep deterministic policy gradient (DDPG), has
achieved good performance in many continuous control tasks in the MuJoCo simulator. To …
achieved good performance in many continuous control tasks in the MuJoCo simulator. To …
Multi-agent deep reinforcement learning based UAV trajectory optimization for differentiated services
Driven by the increasing computational demand of real-time mobile applications, Unmanned
Aerial Vehicle (UAV) assisted Multi-access Edge Computing (MEC) has been envisioned as …
Aerial Vehicle (UAV) assisted Multi-access Edge Computing (MEC) has been envisioned as …