The statistical complexity of interactive decision making
A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …
Bellman eluder dimension: New rich classes of rl problems, and sample-efficient algorithms
Finding the minimal structural assumptions that empower sample-efficient learning is one of
the most important research directions in Reinforcement Learning (RL). This paper …
the most important research directions in Reinforcement Learning (RL). This paper …
Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity
Offline or batch reinforcement learning seeks to learn a near-optimal policy using history
data without active exploration of the environment. To counter the insufficient coverage and …
data without active exploration of the environment. To counter the insufficient coverage and …
Policy finetuning: Bridging sample-efficient offline and online reinforcement learning
Recent theoretical work studies sample-efficient reinforcement learning (RL) extensively in
two settings: learning interactively in the environment (online RL), or learning from an offline …
two settings: learning interactively in the environment (online RL), or learning from an offline …
A sharp analysis of model-based reinforcement learning with self-play
Abstract Model-based algorithms—algorithms that explore the environment through building
and utilizing an estimated model—are widely used in reinforcement learning practice and …
and utilizing an estimated model—are widely used in reinforcement learning practice and …
When can we learn general-sum Markov games with a large number of players sample-efficiently?
Multi-agent reinforcement learning has made substantial empirical progresses in solving
games with a large number of players. However, theoretically, the best known sample …
games with a large number of players. However, theoretically, the best known sample …
Breaking the sample size barrier in model-based reinforcement learning with a generative model
We investigate the sample efficiency of reinforcement learning in a $\gamma $-discounted
infinite-horizon Markov decision process (MDP) with state space S and action space A …
infinite-horizon Markov decision process (MDP) with state space S and action space A …
Dueling rl: Reinforcement learning with trajectory preferences
We consider the problem of preference-based reinforcement learning (PbRL), where, unlike
traditional reinforcement learning (RL), an agent receives feedback only in terms of 1 bit …
traditional reinforcement learning (RL), an agent receives feedback only in terms of 1 bit …
A general framework for sample-efficient function approximation in reinforcement learning
With the increasing need for handling large state and action spaces, general function
approximation has become a key technique in reinforcement learning (RL). In this paper, we …
approximation has become a key technique in reinforcement learning (RL). In this paper, we …
Breaking the sample complexity barrier to regret-optimal model-free reinforcement learning
Achieving sample efficiency in online episodic reinforcement learning (RL) requires
optimally balancing exploration and exploitation. When it comes to a finite-horizon episodic …
optimally balancing exploration and exploitation. When it comes to a finite-horizon episodic …