Pure exploration with multiple correct answers

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

被引用次数：2922 相关文章所有 9 个版本

[PDF] neurips.cc

Top two algorithms revisited

M Jourdan, R Degenne, D Baudry… - Advances in …, 2022 - proceedings.neurips.cc

Top two algorithms arose as an adaptation of Thompson sampling to best arm identification
in multi-armed bandit models for parametric families of arms. They select the next arm to …

被引用次数：36 相关文章所有 12 个版本

[PDF] mlr.press

Gamification of pure exploration for linear bandits

R Degenne, P Ménard, X Shang… - … on Machine Learning, 2020 - proceedings.mlr.press

We investigate an active\emph {pure-exploration} setting, that includes\emph {best-arm
identification}, in the context of\emph {linear stochastic bandits}. While asymptotically optimal …

被引用次数：89 相关文章所有 12 个版本

[PDF] jmlr.org

Mixture martingales revisited with applications to sequential tests and confidence intervals

E Kaufmann, WM Koolen - Journal of Machine Learning Research, 2021 - jmlr.org

This paper presents new deviation inequalities that are valid uniformly in time under
adaptive sampling in a multi-armed bandit model. The deviations are measured using the …

被引用次数：119 相关文章所有 12 个版本

[PDF] neurips.cc

Fast pure exploration via frank-wolfe

PA Wang, RC Tzeng… - Advances in Neural …, 2021 - proceedings.neurips.cc

We study the problem of active pure exploration with fixed confidence in generic stochastic
bandit environments. The goal of the learner is to answer a query about the environment …

被引用次数：36 相关文章所有 9 个版本

[PDF] mlr.press

Beyond no regret: Instance-dependent pac reinforcement learning

AJ Wagenmaker, M Simchowitz… - … on Learning Theory, 2022 - proceedings.mlr.press

The theory of reinforcement learning has focused on two fundamental problems: achieving
low regret, and identifying $\epsilon $-optimal policies. While a simple reduction allows one …

被引用次数：37 相关文章所有 4 个版本

[PDF] neurips.cc

Non-asymptotic pure exploration by solving games

R Degenne, WM Koolen… - Advances in Neural …, 2019 - proceedings.neurips.cc

Pure exploration (aka active testing) is the fundamental task of sequentially gathering
information to answer a query about a stochastic environment. Good algorithms make few …

被引用次数：91 相关文章所有 18 个版本

[PDF] neurips.cc

Best arm identification with fixed budget: A large deviation perspective

PA Wang, RC Tzeng… - Advances in Neural …, 2024 - proceedings.neurips.cc

We consider the problem of identifying the best arm in stochastic Multi-Armed Bandits
(MABs) using a fixed sampling budget. Characterizing the minimal instance-specific error …

被引用次数：4 相关文章所有 7 个版本

[PDF] mlr.press

On the existence of a complexity in fixed budget bandit identification

R Degenne - The Thirty Sixth Annual Conference on …, 2023 - proceedings.mlr.press

In fixed budget bandit identification, an algorithm sequentially observes samples from
several distributions up to a given final time. It then answers a query about the set of …

被引用次数：16 相关文章所有 6 个版本

[PDF] mlr.press

Instance-optimality in interactive decision making: Toward a non-asymptotic theory

AJ Wagenmaker, DJ Foster - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

We consider the development of adaptive, instance-dependent algorithms for interactive
decision making (bandits, reinforcement learning, and beyond) that, rather than only …

被引用次数：14 相关文章所有 3 个版本