Dive into deep learning
This open-source book represents our attempt to make deep learning approachable,
teaching readers the concepts, the context, and the code. The entire book is drafted in …
teaching readers the concepts, the context, and the code. The entire book is drafted in …
Nearly minimax optimal reinforcement learning for linear mixture markov decision processes
We study reinforcement learning (RL) with linear function approximation where the
underlying transition probability kernel of the Markov decision process (MDP) is a linear …
underlying transition probability kernel of the Markov decision process (MDP) is a linear …
Nearly minimax optimal reinforcement learning for linear markov decision processes
We study reinforcement learning (RL) with linear function approximation. For episodic time-
inhomogeneous linear Markov decision processes (linear MDPs) whose transition …
inhomogeneous linear Markov decision processes (linear MDPs) whose transition …
Learning near optimal policies with low inherent bellman error
We study the exploration problem with approximate linear action-value functions in episodic
reinforcement learning under the notion of low inherent Bellman error, a condition normally …
reinforcement learning under the notion of low inherent Bellman error, a condition normally …
Reward-free exploration for reinforcement learning
C Jin, A Krishnamurthy… - … on Machine Learning, 2020 - proceedings.mlr.press
Exploration is widely regarded as one of the most challenging aspects of reinforcement
learning (RL), with many naive approaches succumbing to exponential sample complexity …
learning (RL), with many naive approaches succumbing to exponential sample complexity …
Leveraging offline data in online reinforcement learning
A Wagenmaker, A Pacchiano - International Conference on …, 2023 - proceedings.mlr.press
Two central paradigms have emerged in the reinforcement learning (RL) community: online
RL and offline RL. In the online RL setting, the agent has no prior knowledge of the …
RL and offline RL. In the online RL setting, the agent has no prior knowledge of the …
Guarantees for epsilon-greedy reinforcement learning with function approximation
Myopic exploration policies such as epsilon-greedy, softmax, or Gaussian noise fail to
explore efficiently in some reinforcement learning tasks and yet, they perform well in many …
explore efficiently in some reinforcement learning tasks and yet, they perform well in many …
Almost optimal model-free reinforcement learningvia reference-advantage decomposition
We study the reinforcement learning problem in the setting of finite-horizon1episodic Markov
Decision Processes (MDPs) with S states, A actions, and episode length H. We propose a …
Decision Processes (MDPs) with S states, A actions, and episode length H. We propose a …
Is reinforcement learning more difficult than bandits? a near-optimal algorithm escaping the curse of horizon
Episodic reinforcement learning and contextual bandits are two widely studied sequential
decision-making problems. Episodic reinforcement learning generalizes contextual bandits …
decision-making problems. Episodic reinforcement learning generalizes contextual bandits …
Optimism in reinforcement learning with generalized linear function approximation
We design a new provably efficient algorithm for episodic reinforcement learning with
generalized linear function approximation. We analyze the algorithm under a new …
generalized linear function approximation. We analyze the algorithm under a new …