Dive into deep learning
This open-source book represents our attempt to make deep learning approachable,
teaching readers the concepts, the context, and the code. The entire book is drafted in …
teaching readers the concepts, the context, and the code. The entire book is drafted in …
Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity
Offline or batch reinforcement learning seeks to learn a near-optimal policy using history
data without active exploration of the environment. To counter the insufficient coverage and …
data without active exploration of the environment. To counter the insufficient coverage and …
Nearly minimax optimal reinforcement learning for linear mixture markov decision processes
We study reinforcement learning (RL) with linear function approximation where the
underlying transition probability kernel of the Markov decision process (MDP) is a linear …
underlying transition probability kernel of the Markov decision process (MDP) is a linear …
The curious price of distributional robustness in reinforcement learning with a generative model
This paper investigates model robustness in reinforcement learning (RL) via the framework
of distributionally robust Markov decision processes (RMDPs). Despite recent efforts, the …
of distributionally robust Markov decision processes (RMDPs). Despite recent efforts, the …
Leveraging offline data in online reinforcement learning
A Wagenmaker, A Pacchiano - International Conference on …, 2023 - proceedings.mlr.press
Two central paradigms have emerged in the reinforcement learning (RL) community: online
RL and offline RL. In the online RL setting, the agent has no prior knowledge of the …
RL and offline RL. In the online RL setting, the agent has no prior knowledge of the …
Unpacking reward shaping: Understanding the benefits of reward engineering on sample complexity
The success of reinforcement learning in a variety of challenging sequential decision-
making problems has been much discussed, but often ignored in this discussion is the …
making problems has been much discussed, but often ignored in this discussion is the …
Is reinforcement learning more difficult than bandits? a near-optimal algorithm escaping the curse of horizon
Episodic reinforcement learning and contextual bandits are two widely studied sequential
decision-making problems. Episodic reinforcement learning generalizes contextual bandits …
decision-making problems. Episodic reinforcement learning generalizes contextual bandits …
The blessing of heterogeneity in federated q-learning: Linear speedup and beyond
In this paper, we consider federated Q-learning, which aims to learn an optimal Q-function
by periodically aggregating local Q-estimates trained on local data alone. Focusing on …
by periodically aggregating local Q-estimates trained on local data alone. Focusing on …
The efficacy of pessimism in asynchronous Q-learning
This paper is concerned with the asynchronous form of Q-learning, which applies a
stochastic approximation scheme to Markovian data samples. Motivated by the recent …
stochastic approximation scheme to Markovian data samples. Motivated by the recent …
Revisiting the linear-programming framework for offline rl with general function approximation
Offline reinforcement learning (RL) aims to find an optimal policy for sequential decision-
making using a pre-collected dataset, without further interaction with the environment …
making using a pre-collected dataset, without further interaction with the environment …