Non-asymptotic pure exploration by solving games

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

被引用次数：2922 相关文章所有 9 个版本

[PDF] mlr.press

Learning with good feature representations in bandits and in rl with a generative model

T Lattimore, C Szepesvari… - … conference on machine …, 2020 - proceedings.mlr.press

The construction in the recent paper by Du et al.[2019] implies that searching for a near-
optimal action in a bandit sometimes requires examining essentially all the actions, even if …

被引用次数：191 相关文章所有 7 个版本

[PDF] neurips.cc

Top two algorithms revisited

M Jourdan, R Degenne, D Baudry… - Advances in …, 2022 - proceedings.neurips.cc

Top two algorithms arose as an adaptation of Thompson sampling to best arm identification
in multi-armed bandit models for parametric families of arms. They select the next arm to …

被引用次数：36 相关文章所有 12 个版本

[PDF] mlr.press

Gamification of pure exploration for linear bandits

R Degenne, P Ménard, X Shang… - … on Machine Learning, 2020 - proceedings.mlr.press

We investigate an active\emph {pure-exploration} setting, that includes\emph {best-arm
identification}, in the context of\emph {linear stochastic bandits}. While asymptotically optimal …

被引用次数：89 相关文章所有 12 个版本

[PDF] jmlr.org

Mixture martingales revisited with applications to sequential tests and confidence intervals

E Kaufmann, WM Koolen - Journal of Machine Learning Research, 2021 - jmlr.org

This paper presents new deviation inequalities that are valid uniformly in time under
adaptive sampling in a multi-armed bandit model. The deviations are measured using the …

被引用次数：119 相关文章所有 12 个版本

[PDF] neurips.cc

Fast pure exploration via frank-wolfe

PA Wang, RC Tzeng… - Advances in Neural …, 2021 - proceedings.neurips.cc

We study the problem of active pure exploration with fixed confidence in generic stochastic
bandit environments. The goal of the learner is to answer a query about the environment …

被引用次数：36 相关文章所有 9 个版本

[PDF] mlr.press

Fixed-confidence guarantees for bayesian best-arm identification

X Shang, R Heide, P Menard… - International …, 2020 - proceedings.mlr.press

We investigate and provide new insights on the sampling rule called Top-Two Thompson
Sampling (TTTS). In particular, we justify its use for fixed-confidence best-arm identification …

被引用次数：66 相关文章所有 19 个版本

[PDF] mlr.press

Adaptive exploration in linear contextual bandit

B Hao, T Lattimore… - … Conference on Artificial …, 2020 - proceedings.mlr.press

Contextual bandits serve as a fundamental model for many sequential decision making
tasks. The most popular theoretically justified approaches are based on the optimism …

被引用次数：67 相关文章所有 7 个版本

[PDF] mlr.press

On the existence of a complexity in fixed budget bandit identification

R Degenne - The Thirty Sixth Annual Conference on …, 2023 - proceedings.mlr.press

In fixed budget bandit identification, an algorithm sequentially observes samples from
several distributions up to a given final time. It then answers a query about the set of …

被引用次数：16 相关文章所有 6 个版本

[PDF] mlr.press

Instance-optimality in interactive decision making: Toward a non-asymptotic theory

AJ Wagenmaker, DJ Foster - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

We consider the development of adaptive, instance-dependent algorithms for interactive
decision making (bandits, reinforcement learning, and beyond) that, rather than only …

被引用次数：14 相关文章所有 3 个版本