When is agnostic reinforcement learning statistically tractable?
We study the problem of agnostic PAC reinforcement learning (RL): given a policy class $\Pi
$, how many rounds of interaction with an unknown MDP (with a potentially large state and …
$, how many rounds of interaction with an unknown MDP (with a potentially large state and …
The Value of Reward Lookahead in Reinforcement Learning
In reinforcement learning (RL), agents sequentially interact with changing environments
while aiming to maximize the obtained rewards. Usually, rewards are observed only after …
while aiming to maximize the obtained rewards. Usually, rewards are observed only after …
Towards instance-optimality in online pac reinforcement learning
Several recent works have proposed instance-dependent upper bounds on the number of
episodes needed to identify, with probability $1-\delta $, an $\varepsilon $-optimal policy in …
episodes needed to identify, with probability $1-\delta $, an $\varepsilon $-optimal policy in …
RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation
In many real-world decision problems there is partially observed, hidden or latent
information that remains fixed throughout an interaction. Such decision problems can be …
information that remains fixed throughout an interaction. Such decision problems can be …
Offline Contextual Bandit: Theory and Large Scale Applications
O Sakhi - 2023 - theses.hal.science
This thesis presents contributions to the problem of learning from logged interactions using
the offline contextual bandit framework. We are interested in two related topics:(1) offline …
the offline contextual bandit framework. We are interested in two related topics:(1) offline …
The impact of data distribution on Q-learning with function approximation
We study the interplay between the data distribution and Q-learning-based algorithms with
function approximation. We provide a unified theoretical and empirical analysis as to how …
function approximation. We provide a unified theoretical and empirical analysis as to how …
Adaptive Pure Exploration in Markov Decision Processes and Bandits
A Al Marjani - 2023 - theses.hal.science
This thesis studies pure exploration problems in Markov Decision Processes (MDP) and
Multi-Armed Bandits. These problems have mainly been studied in a “worst-case” …
Multi-Armed Bandits. These problems have mainly been studied in a “worst-case” …