[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

Learning with good feature representations in bandits and in rl with a generative model

T Lattimore, C Szepesvari… - … conference on machine …, 2020 - proceedings.mlr.press
The construction in the recent paper by Du et al.[2019] implies that searching for a near-
optimal action in a bandit sometimes requires examining essentially all the actions, even if …

Top two algorithms revisited

M Jourdan, R Degenne, D Baudry… - Advances in …, 2022 - proceedings.neurips.cc
Top two algorithms arose as an adaptation of Thompson sampling to best arm identification
in multi-armed bandit models for parametric families of arms. They select the next arm to …

Gamification of pure exploration for linear bandits

R Degenne, P Ménard, X Shang… - … on Machine Learning, 2020 - proceedings.mlr.press
We investigate an active\emph {pure-exploration} setting, that includes\emph {best-arm
identification}, in the context of\emph {linear stochastic bandits}. While asymptotically optimal …

Mixture martingales revisited with applications to sequential tests and confidence intervals

E Kaufmann, WM Koolen - Journal of Machine Learning Research, 2021 - jmlr.org
This paper presents new deviation inequalities that are valid uniformly in time under
adaptive sampling in a multi-armed bandit model. The deviations are measured using the …

Fast pure exploration via frank-wolfe

PA Wang, RC Tzeng… - Advances in Neural …, 2021 - proceedings.neurips.cc
We study the problem of active pure exploration with fixed confidence in generic stochastic
bandit environments. The goal of the learner is to answer a query about the environment …

Fixed-confidence guarantees for bayesian best-arm identification

X Shang, R Heide, P Menard… - International …, 2020 - proceedings.mlr.press
We investigate and provide new insights on the sampling rule called Top-Two Thompson
Sampling (TTTS). In particular, we justify its use for fixed-confidence best-arm identification …

Adaptive exploration in linear contextual bandit

B Hao, T Lattimore… - … Conference on Artificial …, 2020 - proceedings.mlr.press
Contextual bandits serve as a fundamental model for many sequential decision making
tasks. The most popular theoretically justified approaches are based on the optimism …

On the existence of a complexity in fixed budget bandit identification

R Degenne - The Thirty Sixth Annual Conference on …, 2023 - proceedings.mlr.press
In fixed budget bandit identification, an algorithm sequentially observes samples from
several distributions up to a given final time. It then answers a query about the set of …

Instance-optimality in interactive decision making: Toward a non-asymptotic theory

AJ Wagenmaker, DJ Foster - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We consider the development of adaptive, instance-dependent algorithms for interactive
decision making (bandits, reinforcement learning, and beyond) that, rather than only …