Optimal best-arm identification in linear bandits

Y Jedra, A Proutiere - Advances in Neural Information …, 2020 - proceedings.neurips.cc
We study the problem of best-arm identification with fixed confidence in stochastic linear
bandits. The objective is to identify the best arm with a given level of certainty while …

Fast pure exploration via frank-wolfe

PA Wang, RC Tzeng… - Advances in Neural …, 2021 - proceedings.neurips.cc
We study the problem of active pure exploration with fixed confidence in generic stochastic
bandit environments. The goal of the learner is to answer a query about the environment …

High-dimensional experimental design and kernel bandits

R Camilleri, K Jamieson… - … on Machine Learning, 2021 - proceedings.mlr.press
In recent years methods from optimal linear experimental design have been leveraged to
obtain state of the art results for linear bandits. A design returned from an objective such as …

Instance-optimal pac algorithms for contextual bandits

Z Li, L Ratliff, KG Jamieson… - Advances in Neural …, 2022 - proceedings.neurips.cc
In the stochastic contextual bandit setting, regret-minimizing algorithms have been
extensively researched, but their instance-minimizing best-arm identification counterparts …

[PDF][PDF] Adaptivity and confounding in multi-armed bandit experiments

C Qin, D Russo - arXiv preprint arXiv:2202.09036, 2022 - aeaweb.org
We explore a new model of bandit experiments where a potentially nonstationary sequence
of contexts influences arms' performance. Context-unaware algorithms risk confounding …

Instance-optimality in interactive decision making: Toward a non-asymptotic theory

AJ Wagenmaker, DJ Foster - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We consider the development of adaptive, instance-dependent algorithms for interactive
decision making (bandits, reinforcement learning, and beyond) that, rather than only …

An asymptotically optimal primal-dual incremental algorithm for contextual linear bandits

A Tirinzoni, M Pirotta, M Restelli… - Advances in Neural …, 2020 - proceedings.neurips.cc
In the contextual linear bandit setting, algorithms built on the optimism principle fail to exploit
the structure of the problem and have been shown to be asymptotically suboptimal. In this …

Regret minimization via saddle point optimization

J Kirschner, A Bakhtiari, K Chandak… - Advances in …, 2024 - proceedings.neurips.cc
A long line of works characterizes the sample complexity of regret minimization in sequential
decision-making by min-max programs. In the corresponding saddle-point game, the min …

Experiment planning with function approximation

A Pacchiano, J Lee, E Brunskill - Advances in Neural …, 2024 - proceedings.neurips.cc
We study the problem of experiment planning with function approximation in contextual
bandit problems. In settings where there is a significant overhead to deploying adaptive …

Leveraging good representations in linear contextual bandits

M Papini, A Tirinzoni, M Restelli… - International …, 2021 - proceedings.mlr.press
The linear contextual bandit literature is mostly focused on the design of efficient learning
algorithms for a given representation. However, a contextual bandit problem may admit …