A finite time analysis of temporal difference learning with linear function approximation

K Zhang, Z Yang, T Başar - Handbook of reinforcement learning and …, 2021 - Springer

Recent years have witnessed significant advances in reinforcement learning (RL), which
has registered tremendous success in solving various sequential decision-making problems …

被引用次数：1663 相关文章所有 8 个版本

[PDF] wiley.com

Recent advances in reinforcement learning in finance

B Hambly, R Xu, H Yang - Mathematical Finance, 2023 - Wiley Online Library

The rapid changes in the finance industry due to the increasing amount of data have
revolutionized the techniques on data processing and data analysis and brought new …

被引用次数：196 相关文章所有 13 个版本

[图书][B] Control systems and reinforcement learning

S Meyn - 2022 - books.google.com

A high school student can create deep Q-learning code to control her robot, without any
understanding of the meaning of'deep'or'Q', or why the code sometimes fails. This book is …

被引用次数：152 相关文章所有 3 个版本

[PDF] arxiv.org

A two-timescale stochastic algorithm framework for bilevel optimization: Complexity analysis and application to actor-critic

M Hong, HT Wai, Z Wang, Z Yang - SIAM Journal on Optimization, 2023 - SIAM

This paper analyzes a two-timescale stochastic algorithm framework for bilevel optimization.
Bilevel optimization is a class of problems which exhibits a two-level structure, and its goal is …

被引用次数：309 相关文章所有 5 个版本

[PDF] neurips.cc

Closing the gap: Tighter analysis of alternating stochastic gradient methods for bilevel problems

T Chen, Y Sun, W Yin - Advances in Neural Information …, 2021 - proceedings.neurips.cc

Stochastic nested optimization, including stochastic compositional, min-max, and bilevel
optimization, is gaining popularity in many machine learning applications. While the three …

被引用次数：125 相关文章所有 6 个版本

[PDF] mlr.press

Policy gradient method for robust reinforcement learning

Y Wang, S Zou - International conference on machine …, 2022 - proceedings.mlr.press

This paper develops the first policy gradient method with global optimality guarantee and
complexity analysis for robust reinforcement learning under model mismatch. Robust …

被引用次数：75 相关文章所有 7 个版本

[PDF] neurips.cc

Online robust reinforcement learning with model uncertainty

Y Wang, S Zou - Advances in Neural Information Processing …, 2021 - proceedings.neurips.cc

Robust reinforcement learning (RL) is to find a policy that optimizes the worst-case
performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust …

被引用次数：108 相关文章所有 10 个版本

[PDF] mlr.press

Federated reinforcement learning: Linear speedup under markovian sampling

S Khodadadian, P Sharma, G Joshi… - International …, 2022 - proceedings.mlr.press

Since reinforcement learning algorithms are notoriously data-intensive, the task of sampling
observations from the environment is usually split across multiple agents. However …

被引用次数：70 相关文章所有 7 个版本

[PDF] mlr.press

Crpo: A new approach for safe reinforcement learning with convergence guarantee

T Xu, Y Liang, G Lan - International Conference on Machine …, 2021 - proceedings.mlr.press

In safe reinforcement learning (SRL) problems, an agent explores the environment to
maximize an expected total reward and meanwhile avoids violation of certain constraints on …

被引用次数：150 相关文章所有 7 个版本

[PDF] mlr.press

Finite-time error bounds for linear stochastic approximation andtd learning

R Srikant, L Ying - Conference on Learning Theory, 2019 - proceedings.mlr.press

We consider the dynamics of a linear stochastic approximation algorithm driven by
Markovian noise, and derive finite-time bounds on the moments of the error, ie, deviation of …

被引用次数：296 相关文章所有 6 个版本