Optimal best-arm identification in linear bandits
Y Jedra, A Proutiere - Advances in Neural Information …, 2020 - proceedings.neurips.cc
We study the problem of best-arm identification with fixed confidence in stochastic linear
bandits. The objective is to identify the best arm with a given level of certainty while …
bandits. The objective is to identify the best arm with a given level of certainty while …
Fast pure exploration via frank-wolfe
We study the problem of active pure exploration with fixed confidence in generic stochastic
bandit environments. The goal of the learner is to answer a query about the environment …
bandit environments. The goal of the learner is to answer a query about the environment …
High-dimensional experimental design and kernel bandits
R Camilleri, K Jamieson… - … on Machine Learning, 2021 - proceedings.mlr.press
In recent years methods from optimal linear experimental design have been leveraged to
obtain state of the art results for linear bandits. A design returned from an objective such as …
obtain state of the art results for linear bandits. A design returned from an objective such as …
Instance-optimal pac algorithms for contextual bandits
In the stochastic contextual bandit setting, regret-minimizing algorithms have been
extensively researched, but their instance-minimizing best-arm identification counterparts …
extensively researched, but their instance-minimizing best-arm identification counterparts …
[PDF][PDF] Adaptivity and confounding in multi-armed bandit experiments
We explore a new model of bandit experiments where a potentially nonstationary sequence
of contexts influences arms' performance. Context-unaware algorithms risk confounding …
of contexts influences arms' performance. Context-unaware algorithms risk confounding …
Instance-optimality in interactive decision making: Toward a non-asymptotic theory
AJ Wagenmaker, DJ Foster - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We consider the development of adaptive, instance-dependent algorithms for interactive
decision making (bandits, reinforcement learning, and beyond) that, rather than only …
decision making (bandits, reinforcement learning, and beyond) that, rather than only …
An asymptotically optimal primal-dual incremental algorithm for contextual linear bandits
In the contextual linear bandit setting, algorithms built on the optimism principle fail to exploit
the structure of the problem and have been shown to be asymptotically suboptimal. In this …
the structure of the problem and have been shown to be asymptotically suboptimal. In this …
Regret minimization via saddle point optimization
A long line of works characterizes the sample complexity of regret minimization in sequential
decision-making by min-max programs. In the corresponding saddle-point game, the min …
decision-making by min-max programs. In the corresponding saddle-point game, the min …
Experiment planning with function approximation
We study the problem of experiment planning with function approximation in contextual
bandit problems. In settings where there is a significant overhead to deploying adaptive …
bandit problems. In settings where there is a significant overhead to deploying adaptive …
Leveraging good representations in linear contextual bandits
The linear contextual bandit literature is mostly focused on the design of efficient learning
algorithms for a given representation. However, a contextual bandit problem may admit …
algorithms for a given representation. However, a contextual bandit problem may admit …