Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity
Offline or batch reinforcement learning seeks to learn a near-optimal policy using history
data without active exploration of the environment. To counter the insufficient coverage and …
data without active exploration of the environment. To counter the insufficient coverage and …
The curious price of distributional robustness in reinforcement learning with a generative model
This paper investigates model robustness in reinforcement learning (RL) via the framework
of distributionally robust Markov decision processes (RMDPs). Despite recent efforts, the …
of distributionally robust Markov decision processes (RMDPs). Despite recent efforts, the …
Settling the sample complexity of model-based offline reinforcement learning
Settling the sample complexity of model-based offline reinforcement learning Page 1 The
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …
Federated reinforcement learning: Linear speedup under markovian sampling
Since reinforcement learning algorithms are notoriously data-intensive, the task of sampling
observations from the environment is usually split across multiple agents. However …
observations from the environment is usually split across multiple agents. However …
Breaking the sample size barrier in model-based reinforcement learning with a generative model
We investigate the sample efficiency of reinforcement learning in a $\gamma $-discounted
infinite-horizon Markov decision process (MDP) with state space S and action space A …
infinite-horizon Markov decision process (MDP) with state space S and action space A …
Is Q-learning minimax optimal? a tight sample complexity analysis
Q-learning, which seeks to learn the optimal Q-function of a Markov decision process (MDP)
in a model-free fashion, lies at the heart of reinforcement learning. When it comes to the …
in a model-free fashion, lies at the heart of reinforcement learning. When it comes to the …
The blessing of heterogeneity in federated q-learning: Linear speedup and beyond
In this paper, we consider federated Q-learning, which aims to learn an optimal Q-function
by periodically aggregating local Q-estimates trained on local data alone. Focusing on …
by periodically aggregating local Q-estimates trained on local data alone. Focusing on …
The efficacy of pessimism in asynchronous Q-learning
This paper is concerned with the asynchronous form of Q-learning, which applies a
stochastic approximation scheme to Markovian data samples. Motivated by the recent …
stochastic approximation scheme to Markovian data samples. Motivated by the recent …
A finite sample complexity bound for distributionally robust q-learning
We consider a reinforcement learning setting in which the deployment environment is
different from the training environment. Applying a robust Markov decision processes …
different from the training environment. Applying a robust Markov decision processes …
Breaking the sample complexity barrier to regret-optimal model-free reinforcement learning
Achieving sample efficiency in online episodic reinforcement learning (RL) requires
optimally balancing exploration and exploitation. When it comes to a finite-horizon episodic …
optimally balancing exploration and exploitation. When it comes to a finite-horizon episodic …