Finite sample analyses for TD (0) with function approximation

B Hambly, R Xu, H Yang - Mathematical Finance, 2023 - Wiley Online Library

The rapid changes in the finance industry due to the increasing amount of data have
revolutionized the techniques on data processing and data analysis and brought new …

被引用次数：126 相关文章所有 13 个版本

[PDF] mlr.press

Policy gradient method for robust reinforcement learning

Y Wang, S Zou - International conference on machine …, 2022 - proceedings.mlr.press

This paper develops the first policy gradient method with global optimality guarantee and
complexity analysis for robust reinforcement learning under model mismatch. Robust …

被引用次数：60 相关文章所有 7 个版本

[PDF] neurips.cc

Online robust reinforcement learning with model uncertainty

Y Wang, S Zou - Advances in Neural Information Processing …, 2021 - proceedings.neurips.cc

Robust reinforcement learning (RL) is to find a policy that optimizes the worst-case
performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust …

被引用次数：89 相关文章所有 10 个版本

[PDF] mlr.press

A finite time analysis of temporal difference learning with linear function approximation

J Bhandari, D Russo, R Singal - Conference on learning …, 2018 - proceedings.mlr.press

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value
function corresponding to a given policy in a Markov decision process. Although TD is one of …

被引用次数：396 相关文章所有 11 个版本

[PDF] mlr.press

Federated reinforcement learning: Linear speedup under markovian sampling

S Khodadadian, P Sharma, G Joshi… - International …, 2022 - proceedings.mlr.press

Since reinforcement learning algorithms are notoriously data-intensive, the task of sampling
observations from the environment is usually split across multiple agents. However …

被引用次数：52 相关文章所有 7 个版本

[PDF] mlr.press

Crpo: A new approach for safe reinforcement learning with convergence guarantee

T Xu, Y Liang, G Lan - International Conference on Machine …, 2021 - proceedings.mlr.press

In safe reinforcement learning (SRL) problems, an agent explores the environment to
maximize an expected total reward and meanwhile avoids violation of certain constraints on …

被引用次数：126 相关文章所有 7 个版本

[PDF] mlr.press

Finite-time error bounds for linear stochastic approximation andtd learning

R Srikant, L Ying - Conference on Learning Theory, 2019 - proceedings.mlr.press

We consider the dynamics of a linear stochastic approximation algorithm driven by
Markovian noise, and derive finite-time bounds on the moments of the error, ie, deviation of …

被引用次数：267 相关文章所有 6 个版本

[PDF] neurips.cc

Finite-sample analysis for sarsa with linear function approximation

S Zou, T Xu, Y Liang - Advances in neural information …, 2019 - proceedings.neurips.cc

SARSA is an on-policy algorithm to learn a Markov decision process policy in reinforcement
learning. We investigate the SARSA algorithm with linear function approximation under the …

被引用次数：195 相关文章所有 9 个版本

[PDF] aaai.org

Count-based exploration with the successor representation

MC Machado, MG Bellemare, M Bowling - Proceedings of the AAAI …, 2020 - ojs.aaai.org

In this paper we introduce a simple approach for exploration in reinforcement learning (RL)
that allows us to develop theoretically justified algorithms in the tabular case but that is also …

被引用次数：193 相关文章所有 10 个版本

[PDF] neurips.cc

Breaking the sample size barrier in model-based reinforcement learning with a generative model

G Li, Y Wei, Y Chi, Y Gu… - Advances in neural …, 2020 - proceedings.neurips.cc

We investigate the sample efficiency of reinforcement learning in a $\gamma $-discounted
infinite-horizon Markov decision process (MDP) with state space S and action space A …

被引用次数：133 相关文章所有 10 个版本