Gamification of pure exploration for linear bandits

Y Jedra, A Proutiere - Advances in Neural Information …, 2020 - proceedings.neurips.cc

We study the problem of best-arm identification with fixed confidence in stochastic linear
bandits. The objective is to identify the best arm with a given level of certainty while …

被引用次数：80 相关文章所有 6 个版本

[PDF] neurips.cc

Fast pure exploration via frank-wolfe

PA Wang, RC Tzeng… - Advances in Neural …, 2021 - proceedings.neurips.cc

We study the problem of active pure exploration with fixed confidence in generic stochastic
bandit environments. The goal of the learner is to answer a query about the environment …

被引用次数：36 相关文章所有 9 个版本

[PDF] mlr.press

High-dimensional experimental design and kernel bandits

R Camilleri, K Jamieson… - … on Machine Learning, 2021 - proceedings.mlr.press

In recent years methods from optimal linear experimental design have been leveraged to
obtain state of the art results for linear bandits. A design returned from an objective such as …

被引用次数：49 相关文章所有 5 个版本

[PDF] neurips.cc

Instance-optimal pac algorithms for contextual bandits

Z Li, L Ratliff, KG Jamieson… - Advances in Neural …, 2022 - proceedings.neurips.cc

In the stochastic contextual bandit setting, regret-minimizing algorithms have been
extensively researched, but their instance-minimizing best-arm identification counterparts …

被引用次数：20 相关文章所有 12 个版本

[PDF] aeaweb.org

[PDF][PDF] Adaptivity and confounding in multi-armed bandit experiments

C Qin, D Russo - arXiv preprint arXiv:2202.09036, 2022 - aeaweb.org

We explore a new model of bandit experiments where a potentially nonstationary sequence
of contexts influences arms' performance. Context-unaware algorithms risk confounding …

被引用次数：30 相关文章所有 3 个版本

[PDF] mlr.press

Instance-optimality in interactive decision making: Toward a non-asymptotic theory

AJ Wagenmaker, DJ Foster - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

We consider the development of adaptive, instance-dependent algorithms for interactive
decision making (bandits, reinforcement learning, and beyond) that, rather than only …

被引用次数：14 相关文章所有 3 个版本

[PDF] neurips.cc

An asymptotically optimal primal-dual incremental algorithm for contextual linear bandits

A Tirinzoni, M Pirotta, M Restelli… - Advances in Neural …, 2020 - proceedings.neurips.cc

In the contextual linear bandit setting, algorithms built on the optimism principle fail to exploit
the structure of the problem and have been shown to be asymptotically suboptimal. In this …

被引用次数：46 相关文章所有 7 个版本

[PDF] neurips.cc

Regret minimization via saddle point optimization

J Kirschner, A Bakhtiari, K Chandak… - Advances in …, 2024 - proceedings.neurips.cc

A long line of works characterizes the sample complexity of regret minimization in sequential
decision-making by min-max programs. In the corresponding saddle-point game, the min …

被引用次数：2 相关文章所有 6 个版本

[PDF] neurips.cc

Experiment planning with function approximation

A Pacchiano, J Lee, E Brunskill - Advances in Neural …, 2024 - proceedings.neurips.cc

We study the problem of experiment planning with function approximation in contextual
bandit problems. In settings where there is a significant overhead to deploying adaptive …

被引用次数：3 相关文章所有 6 个版本

[PDF] mlr.press

Leveraging good representations in linear contextual bandits

M Papini, A Tirinzoni, M Restelli… - International …, 2021 - proceedings.mlr.press

The linear contextual bandit literature is mostly focused on the design of efficient learning
algorithms for a given representation. However, a contextual bandit problem may admit …

被引用次数：28 相关文章所有 8 个版本