Active coverage for pac reinforcement learning

Z Jia, G Li, A Rakhlin, A Sekhari… - Advances in Neural …, 2024 - proceedings.neurips.cc

We study the problem of agnostic PAC reinforcement learning (RL): given a policy class $\Pi
$, how many rounds of interaction with an unknown MDP (with a potentially large state and …

被引用次数：4 相关文章所有 6 个版本

[PDF] arxiv.org

The Value of Reward Lookahead in Reinforcement Learning

N Merlis, D Baudry, V Perchet - arXiv preprint arXiv:2403.11637, 2024 - arxiv.org

In reinforcement learning (RL), agents sequentially interact with changing environments
while aiming to maximize the obtained rewards. Usually, rewards are observed only after …

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

Towards instance-optimality in online pac reinforcement learning

A Al-Marjani, A Tirinzoni, E Kaufmann - arXiv preprint arXiv:2311.05638, 2023 - arxiv.org

Several recent works have proposed instance-dependent upper bounds on the number of
episodes needed to identify, with probability $1-\delta $, an $\varepsilon $-optimal policy in …

被引用次数：3 相关文章所有 6 个版本

[PDF] arxiv.org

RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation

J Kwon, S Mannor, C Caramanis, Y Efroni - arXiv preprint arXiv …, 2024 - arxiv.org

In many real-world decision problems there is partially observed, hidden or latent
information that remains fixed throughout an interaction. Such decision problems can be …

被引用次数：1 相关文章所有 2 个版本

[PDF] hal.science

Offline Contextual Bandit: Theory and Large Scale Applications

O Sakhi - 2023 - theses.hal.science

This thesis presents contributions to the problem of learning from logged interactions using
the offline contextual bandit framework. We are interested in two related topics:(1) offline …

被引用次数：1 相关文章所有 6 个版本

[PDF] springer.com

The impact of data distribution on Q-learning with function approximation

PP Santos, DS Carvalho, A Sardinha, FS Melo - Machine Learning, 2024 - Springer

We study the interplay between the data distribution and Q-learning-based algorithms with
function approximation. We provide a unified theoretical and empirical analysis as to how …

被引用次数：2 相关文章所有 3 个版本

[PDF] hal.science

Adaptive Pure Exploration in Markov Decision Processes and Bandits

A Al Marjani - 2023 - theses.hal.science

This thesis studies pure exploration problems in Markov Decision Processes (MDP) and
Multi-Armed Bandits. These problems have mainly been studied in a “worst-case” …