Non-asymptotic pure exploration by solving games
R Degenne, WM Koolen… - Advances in Neural …, 2019 - proceedings.neurips.cc
Pure exploration (aka active testing) is the fundamental task of sequentially gathering
information to answer a query about a stochastic environment. Good algorithms make few …
information to answer a query about a stochastic environment. Good algorithms make few …
On the existence of a complexity in fixed budget bandit identification
R Degenne - The Thirty Sixth Annual Conference on …, 2023 - proceedings.mlr.press
In fixed budget bandit identification, an algorithm sequentially observes samples from
several distributions up to a given final time. It then answers a query about the set of …
several distributions up to a given final time. It then answers a query about the set of …
Instance-optimality in interactive decision making: Toward a non-asymptotic theory
AJ Wagenmaker, DJ Foster - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We consider the development of adaptive, instance-dependent algorithms for interactive
decision making (bandits, reinforcement learning, and beyond) that, rather than only …
decision making (bandits, reinforcement learning, and beyond) that, rather than only …
An empirical process approach to the union bound: Practical algorithms for combinatorial and linear bandits
J Katz-Samuels, L Jain… - Advances in Neural …, 2020 - proceedings.neurips.cc
This paper proposes near-optimal algorithms for the pure-exploration linear bandit problem
in the fixed confidence and fixed budget settings. Leveraging ideas from the theory of …
in the fixed confidence and fixed budget settings. Leveraging ideas from the theory of …
Popart: Efficient sparse regression and experimental design for optimal sparse linear bandits
In sparse linear bandits, a learning agent sequentially selects an action from a fixed action
set and receives reward feedback, and the reward function depends linearly on a few …
set and receives reward feedback, and the reward function depends linearly on a few …
An -Best-Arm Identification Algorithm for Fixed-Confidence and Beyond
We propose EB-TC $\varepsilon $, a novel sampling rule for $\varepsilon $-best arm
identification in stochastic bandits. It is the first instance of Top Two algorithm analyzed for …
identification in stochastic bandits. It is the first instance of Top Two algorithm analyzed for …
Revisiting simple regret: Fast rates for returning a good arm
Simple regret is a natural and parameter-free performance criterion for pure exploration in
multi-armed bandits yet is less popular than the probability of missing the best arm or an …
multi-armed bandits yet is less popular than the probability of missing the best arm or an …
A framework for multi-a (rmed)/b (andit) testing with online fdr control
We propose an alternative framework to existing setups for controlling false alarms when
multiple A/B tests are run over time. This setup arises in many practical applications, eg …
multiple A/B tests are run over time. This setup arises in many practical applications, eg …
Active learning with safety constraints
R Camilleri, A Wagenmaker… - Advances in …, 2022 - proceedings.neurips.cc
Active learning methods have shown great promise in reducing the number of samples
necessary for learning. As automated learning systems are adopted into real-time, real …
necessary for learning. As automated learning systems are adopted into real-time, real …
Non-asymptotic analysis of a ucb-based top two algorithm
A Top Two sampling rule for bandit identification is a method which selects the next arm to
sample from among two candidate arms, a leader and a challenger. Due to their simplicity …
sample from among two candidate arms, a leader and a challenger. Due to their simplicity …