A tale of two-timescale reinforcement learning with the tightest finite-time bound

A two-timescale stochastic algorithm framework for bilevel optimization: Complexity analysis and application to actor-critic

M Hong, HT Wai, Z Wang, Z Yang - SIAM Journal on Optimization, 2023 - SIAM

This paper analyzes a two-timescale stochastic algorithm framework for bilevel optimization.
Bilevel optimization is a class of problems which exhibits a two-level structure, and its goal is …

被引用次数：257 相关文章所有 5 个版本

[PDF] neurips.cc

Online robust reinforcement learning with model uncertainty

Y Wang, S Zou - Advances in Neural Information Processing …, 2021 - proceedings.neurips.cc

Robust reinforcement learning (RL) is to find a policy that optimizes the worst-case
performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust …

被引用次数：89 相关文章所有 10 个版本

[PDF] mlr.press

Crpo: A new approach for safe reinforcement learning with convergence guarantee

T Xu, Y Liang, G Lan - International Conference on Machine …, 2021 - proceedings.mlr.press

In safe reinforcement learning (SRL) problems, an agent explores the environment to
maximize an expected total reward and meanwhile avoids violation of certain constraints on …

被引用次数：126 相关文章所有 7 个版本

[PDF] github.io

On finite-time convergence of actor-critic algorithm

S Qiu, Z Yang, J Ye, Z Wang - IEEE Journal on Selected Areas …, 2021 - ieeexplore.ieee.org

Actor-critic algorithm and their extensions have made great achievements in real-world
decision-making problems. In contrast to its empirical success, the theoretical understanding …

被引用次数：76 相关文章所有 2 个版本

[PDF] arxiv.org

On the sample complexity of actor-critic method for reinforcement learning with function approximation

H Kumar, A Koppel, A Ribeiro - Machine Learning, 2023 - Springer

Reinforcement learning, mathematically described by Markov Decision Problems, may be
approached either through dynamic programming or policy search. Actor-critic algorithms …

被引用次数：106 相关文章所有 5 个版本

[PDF] neurips.cc

Finite-time analysis of whittle index based Q-learning for restless multi-armed bandits with neural network function approximation

G Xiong, J Li - Advances in Neural Information Processing …, 2023 - proceedings.neurips.cc

Whittle index policy is a heuristic to the intractable restless multi-armed bandits (RMAB)
problem. Although it is provably asymptotically optimal, finding Whittle indices remains …

被引用次数：6 相关文章所有 8 个版本

[PDF] mlr.press

Finite time analysis of linear two-timescale stochastic approximation with Markovian noise

M Kaledin, E Moulines, A Naumov… - … on Learning Theory, 2020 - proceedings.mlr.press

Linear two-timescale stochastic approximation (SA) scheme is an important class of
algorithms which has become popular in reinforcement learning (RL), particularly for the …

被引用次数：80 相关文章所有 7 个版本

[PDF] neurips.cc

Scalable primal-dual actor-critic method for safe multi-agent rl with general utilities

D Ying, Y Zhang, Y Ding, A Koppel… - Advances in Neural …, 2024 - proceedings.neurips.cc

We investigate safe multi-agent reinforcement learning, where agents seek to collectively
maximize an aggregate sum of local objectives while satisfying their own safety constraints …

被引用次数：6 相关文章所有 6 个版本

[PDF] arxiv.org

Non-asymptotic convergence analysis of two time-scale (natural) actor-critic algorithms

T Xu, Z Wang, Y Liang - arXiv preprint arXiv:2005.03557, 2020 - arxiv.org

As an important type of reinforcement learning algorithms, actor-critic (AC) and natural actor-
critic (NAC) algorithms are often executed in two ways for finding optimal policies. In the first …

被引用次数：60 相关文章所有 3 个版本

[PDF] arxiv.org

Finite-Time High-Probability Bounds for Polyak–Ruppert Averaged Iterates of Linear Stochastic Approximation

A Durmus, E Moulines, A Naumov… - Mathematics of …, 2024 - pubsonline.informs.org

This paper provides a finite-time analysis of linear stochastic approximation (LSA)
algorithms with fixed step size, a core method in statistics and machine learning. LSA is …

被引用次数：17 相关文章所有 3 个版本