A two-timescale stochastic algorithm framework for bilevel optimization: Complexity analysis and application to actor-critic
This paper analyzes a two-timescale stochastic algorithm framework for bilevel optimization.
Bilevel optimization is a class of problems which exhibits a two-level structure, and its goal is …
Bilevel optimization is a class of problems which exhibits a two-level structure, and its goal is …
Online robust reinforcement learning with model uncertainty
Robust reinforcement learning (RL) is to find a policy that optimizes the worst-case
performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust …
performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust …
Crpo: A new approach for safe reinforcement learning with convergence guarantee
In safe reinforcement learning (SRL) problems, an agent explores the environment to
maximize an expected total reward and meanwhile avoids violation of certain constraints on …
maximize an expected total reward and meanwhile avoids violation of certain constraints on …
On finite-time convergence of actor-critic algorithm
Actor-critic algorithm and their extensions have made great achievements in real-world
decision-making problems. In contrast to its empirical success, the theoretical understanding …
decision-making problems. In contrast to its empirical success, the theoretical understanding …
On the sample complexity of actor-critic method for reinforcement learning with function approximation
Reinforcement learning, mathematically described by Markov Decision Problems, may be
approached either through dynamic programming or policy search. Actor-critic algorithms …
approached either through dynamic programming or policy search. Actor-critic algorithms …
Finite-time analysis of whittle index based Q-learning for restless multi-armed bandits with neural network function approximation
G Xiong, J Li - Advances in Neural Information Processing …, 2023 - proceedings.neurips.cc
Whittle index policy is a heuristic to the intractable restless multi-armed bandits (RMAB)
problem. Although it is provably asymptotically optimal, finding Whittle indices remains …
problem. Although it is provably asymptotically optimal, finding Whittle indices remains …
Finite time analysis of linear two-timescale stochastic approximation with Markovian noise
Linear two-timescale stochastic approximation (SA) scheme is an important class of
algorithms which has become popular in reinforcement learning (RL), particularly for the …
algorithms which has become popular in reinforcement learning (RL), particularly for the …
Scalable primal-dual actor-critic method for safe multi-agent rl with general utilities
We investigate safe multi-agent reinforcement learning, where agents seek to collectively
maximize an aggregate sum of local objectives while satisfying their own safety constraints …
maximize an aggregate sum of local objectives while satisfying their own safety constraints …
Non-asymptotic convergence analysis of two time-scale (natural) actor-critic algorithms
As an important type of reinforcement learning algorithms, actor-critic (AC) and natural actor-
critic (NAC) algorithms are often executed in two ways for finding optimal policies. In the first …
critic (NAC) algorithms are often executed in two ways for finding optimal policies. In the first …
Finite-Time High-Probability Bounds for Polyak–Ruppert Averaged Iterates of Linear Stochastic Approximation
This paper provides a finite-time analysis of linear stochastic approximation (LSA)
algorithms with fixed step size, a core method in statistics and machine learning. LSA is …
algorithms with fixed step size, a core method in statistics and machine learning. LSA is …