Non-asymptotic pure exploration by solving games

R Degenne, WM Koolen… - Advances in Neural …, 2019 - proceedings.neurips.cc
Pure exploration (aka active testing) is the fundamental task of sequentially gathering
information to answer a query about a stochastic environment. Good algorithms make few …

On the existence of a complexity in fixed budget bandit identification

R Degenne - The Thirty Sixth Annual Conference on …, 2023 - proceedings.mlr.press
In fixed budget bandit identification, an algorithm sequentially observes samples from
several distributions up to a given final time. It then answers a query about the set of …

Instance-optimality in interactive decision making: Toward a non-asymptotic theory

AJ Wagenmaker, DJ Foster - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We consider the development of adaptive, instance-dependent algorithms for interactive
decision making (bandits, reinforcement learning, and beyond) that, rather than only …

An empirical process approach to the union bound: Practical algorithms for combinatorial and linear bandits

J Katz-Samuels, L Jain… - Advances in Neural …, 2020 - proceedings.neurips.cc
This paper proposes near-optimal algorithms for the pure-exploration linear bandit problem
in the fixed confidence and fixed budget settings. Leveraging ideas from the theory of …

Popart: Efficient sparse regression and experimental design for optimal sparse linear bandits

K Jang, C Zhang, KS Jun - Advances in Neural Information …, 2022 - proceedings.neurips.cc
In sparse linear bandits, a learning agent sequentially selects an action from a fixed action
set and receives reward feedback, and the reward function depends linearly on a few …

An -Best-Arm Identification Algorithm for Fixed-Confidence and Beyond

M Jourdan, R Degenne… - Advances in Neural …, 2023 - proceedings.neurips.cc
We propose EB-TC $\varepsilon $, a novel sampling rule for $\varepsilon $-best arm
identification in stochastic bandits. It is the first instance of Top Two algorithm analyzed for …

Revisiting simple regret: Fast rates for returning a good arm

Y Zhao, C Stephens, C Szepesvári… - … on Machine Learning, 2023 - proceedings.mlr.press
Simple regret is a natural and parameter-free performance criterion for pure exploration in
multi-armed bandits yet is less popular than the probability of missing the best arm or an …

A framework for multi-a (rmed)/b (andit) testing with online fdr control

F Yang, A Ramdas, KG Jamieson… - Advances in Neural …, 2017 - proceedings.neurips.cc
We propose an alternative framework to existing setups for controlling false alarms when
multiple A/B tests are run over time. This setup arises in many practical applications, eg …

Active learning with safety constraints

R Camilleri, A Wagenmaker… - Advances in …, 2022 - proceedings.neurips.cc
Active learning methods have shown great promise in reducing the number of samples
necessary for learning. As automated learning systems are adopted into real-time, real …

Non-asymptotic analysis of a ucb-based top two algorithm

M Jourdan, R Degenne - Advances in Neural Information …, 2024 - proceedings.neurips.cc
A Top Two sampling rule for bandit identification is a method which selects the next arm to
sample from among two candidate arms, a leader and a challenger. Due to their simplicity …