[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

Top two algorithms revisited

M Jourdan, R Degenne, D Baudry… - Advances in …, 2022 - proceedings.neurips.cc
Top two algorithms arose as an adaptation of Thompson sampling to best arm identification
in multi-armed bandit models for parametric families of arms. They select the next arm to …

Gamification of pure exploration for linear bandits

R Degenne, P Ménard, X Shang… - … on Machine Learning, 2020 - proceedings.mlr.press
We investigate an active\emph {pure-exploration} setting, that includes\emph {best-arm
identification}, in the context of\emph {linear stochastic bandits}. While asymptotically optimal …

Mixture martingales revisited with applications to sequential tests and confidence intervals

E Kaufmann, WM Koolen - Journal of Machine Learning Research, 2021 - jmlr.org
This paper presents new deviation inequalities that are valid uniformly in time under
adaptive sampling in a multi-armed bandit model. The deviations are measured using the …

Fast pure exploration via frank-wolfe

PA Wang, RC Tzeng… - Advances in Neural …, 2021 - proceedings.neurips.cc
We study the problem of active pure exploration with fixed confidence in generic stochastic
bandit environments. The goal of the learner is to answer a query about the environment …

Beyond no regret: Instance-dependent pac reinforcement learning

AJ Wagenmaker, M Simchowitz… - … on Learning Theory, 2022 - proceedings.mlr.press
The theory of reinforcement learning has focused on two fundamental problems: achieving
low regret, and identifying $\epsilon $-optimal policies. While a simple reduction allows one …

Non-asymptotic pure exploration by solving games

R Degenne, WM Koolen… - Advances in Neural …, 2019 - proceedings.neurips.cc
Pure exploration (aka active testing) is the fundamental task of sequentially gathering
information to answer a query about a stochastic environment. Good algorithms make few …

Best arm identification with fixed budget: A large deviation perspective

PA Wang, RC Tzeng… - Advances in Neural …, 2024 - proceedings.neurips.cc
We consider the problem of identifying the best arm in stochastic Multi-Armed Bandits
(MABs) using a fixed sampling budget. Characterizing the minimal instance-specific error …

On the existence of a complexity in fixed budget bandit identification

R Degenne - The Thirty Sixth Annual Conference on …, 2023 - proceedings.mlr.press
In fixed budget bandit identification, an algorithm sequentially observes samples from
several distributions up to a given final time. It then answers a query about the set of …

Instance-optimality in interactive decision making: Toward a non-asymptotic theory

AJ Wagenmaker, DJ Foster - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We consider the development of adaptive, instance-dependent algorithms for interactive
decision making (bandits, reinforcement learning, and beyond) that, rather than only …