A two-timescale stochastic algorithm framework for bilevel optimization: Complexity analysis and application to actor-critic

M Hong, HT Wai, Z Wang, Z Yang - SIAM Journal on Optimization, 2023 - SIAM
This paper analyzes a two-timescale stochastic algorithm framework for bilevel optimization.
Bilevel optimization is a class of problems which exhibits a two-level structure, and its goal is …

Online robust reinforcement learning with model uncertainty

Y Wang, S Zou - Advances in Neural Information Processing …, 2021 - proceedings.neurips.cc
Robust reinforcement learning (RL) is to find a policy that optimizes the worst-case
performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust …

Crpo: A new approach for safe reinforcement learning with convergence guarantee

T Xu, Y Liang, G Lan - International Conference on Machine …, 2021 - proceedings.mlr.press
In safe reinforcement learning (SRL) problems, an agent explores the environment to
maximize an expected total reward and meanwhile avoids violation of certain constraints on …

On finite-time convergence of actor-critic algorithm

S Qiu, Z Yang, J Ye, Z Wang - IEEE Journal on Selected Areas …, 2021 - ieeexplore.ieee.org
Actor-critic algorithm and their extensions have made great achievements in real-world
decision-making problems. In contrast to its empirical success, the theoretical understanding …

On the sample complexity of actor-critic method for reinforcement learning with function approximation

H Kumar, A Koppel, A Ribeiro - Machine Learning, 2023 - Springer
Reinforcement learning, mathematically described by Markov Decision Problems, may be
approached either through dynamic programming or policy search. Actor-critic algorithms …

Finite-time analysis of whittle index based Q-learning for restless multi-armed bandits with neural network function approximation

G Xiong, J Li - Advances in Neural Information Processing …, 2023 - proceedings.neurips.cc
Whittle index policy is a heuristic to the intractable restless multi-armed bandits (RMAB)
problem. Although it is provably asymptotically optimal, finding Whittle indices remains …

Finite time analysis of linear two-timescale stochastic approximation with Markovian noise

M Kaledin, E Moulines, A Naumov… - … on Learning Theory, 2020 - proceedings.mlr.press
Linear two-timescale stochastic approximation (SA) scheme is an important class of
algorithms which has become popular in reinforcement learning (RL), particularly for the …

Scalable primal-dual actor-critic method for safe multi-agent rl with general utilities

D Ying, Y Zhang, Y Ding, A Koppel… - Advances in Neural …, 2024 - proceedings.neurips.cc
We investigate safe multi-agent reinforcement learning, where agents seek to collectively
maximize an aggregate sum of local objectives while satisfying their own safety constraints …

Non-asymptotic convergence analysis of two time-scale (natural) actor-critic algorithms

T Xu, Z Wang, Y Liang - arXiv preprint arXiv:2005.03557, 2020 - arxiv.org
As an important type of reinforcement learning algorithms, actor-critic (AC) and natural actor-
critic (NAC) algorithms are often executed in two ways for finding optimal policies. In the first …

Finite-Time High-Probability Bounds for Polyak–Ruppert Averaged Iterates of Linear Stochastic Approximation

A Durmus, E Moulines, A Naumov… - Mathematics of …, 2024 - pubsonline.informs.org
This paper provides a finite-time analysis of linear stochastic approximation (LSA)
algorithms with fixed step size, a core method in statistics and machine learning. LSA is …