Recent advances in reinforcement learning in finance
The rapid changes in the finance industry due to the increasing amount of data have
revolutionized the techniques on data processing and data analysis and brought new …
revolutionized the techniques on data processing and data analysis and brought new …
Policy gradient method for robust reinforcement learning
This paper develops the first policy gradient method with global optimality guarantee and
complexity analysis for robust reinforcement learning under model mismatch. Robust …
complexity analysis for robust reinforcement learning under model mismatch. Robust …
Online robust reinforcement learning with model uncertainty
Robust reinforcement learning (RL) is to find a policy that optimizes the worst-case
performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust …
performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust …
A finite time analysis of temporal difference learning with linear function approximation
Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value
function corresponding to a given policy in a Markov decision process. Although TD is one of …
function corresponding to a given policy in a Markov decision process. Although TD is one of …
Federated reinforcement learning: Linear speedup under markovian sampling
Since reinforcement learning algorithms are notoriously data-intensive, the task of sampling
observations from the environment is usually split across multiple agents. However …
observations from the environment is usually split across multiple agents. However …
Crpo: A new approach for safe reinforcement learning with convergence guarantee
In safe reinforcement learning (SRL) problems, an agent explores the environment to
maximize an expected total reward and meanwhile avoids violation of certain constraints on …
maximize an expected total reward and meanwhile avoids violation of certain constraints on …
Finite-time error bounds for linear stochastic approximation andtd learning
We consider the dynamics of a linear stochastic approximation algorithm driven by
Markovian noise, and derive finite-time bounds on the moments of the error, ie, deviation of …
Markovian noise, and derive finite-time bounds on the moments of the error, ie, deviation of …
Finite-sample analysis for sarsa with linear function approximation
SARSA is an on-policy algorithm to learn a Markov decision process policy in reinforcement
learning. We investigate the SARSA algorithm with linear function approximation under the …
learning. We investigate the SARSA algorithm with linear function approximation under the …
Count-based exploration with the successor representation
In this paper we introduce a simple approach for exploration in reinforcement learning (RL)
that allows us to develop theoretically justified algorithms in the tabular case but that is also …
that allows us to develop theoretically justified algorithms in the tabular case but that is also …
Breaking the sample size barrier in model-based reinforcement learning with a generative model
We investigate the sample efficiency of reinforcement learning in a $\gamma $-discounted
infinite-horizon Markov decision process (MDP) with state space S and action space A …
infinite-horizon Markov decision process (MDP) with state space S and action space A …