- 学术资源搜索

Non-asymptotic pure exploration by solving games

R Degenne, WM Koolen… - Advances in Neural …, 2019 - proceedings.neurips.cc

Pure exploration (aka active testing) is the fundamental task of sequentially gathering
information to answer a query about a stochastic environment. Good algorithms make few …

被引用次数：87 相关文章所有 18 个版本

[PDF] mlr.press

On the existence of a complexity in fixed budget bandit identification

R Degenne - The Thirty Sixth Annual Conference on …, 2023 - proceedings.mlr.press

In fixed budget bandit identification, an algorithm sequentially observes samples from
several distributions up to a given final time. It then answers a query about the set of …

被引用次数：16 相关文章所有 6 个版本

[PDF] mlr.press

Instance-optimality in interactive decision making: Toward a non-asymptotic theory

AJ Wagenmaker, DJ Foster - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

We consider the development of adaptive, instance-dependent algorithms for interactive
decision making (bandits, reinforcement learning, and beyond) that, rather than only …

被引用次数：11 相关文章所有 3 个版本

[PDF] neurips.cc

An empirical process approach to the union bound: Practical algorithms for combinatorial and linear bandits

J Katz-Samuels, L Jain… - Advances in Neural …, 2020 - proceedings.neurips.cc

This paper proposes near-optimal algorithms for the pure-exploration linear bandit problem
in the fixed confidence and fixed budget settings. Leveraging ideas from the theory of …

被引用次数：55 相关文章所有 6 个版本

[PDF] neurips.cc

Popart: Efficient sparse regression and experimental design for optimal sparse linear bandits

K Jang, C Zhang, KS Jun - Advances in Neural Information …, 2022 - proceedings.neurips.cc

In sparse linear bandits, a learning agent sequentially selects an action from a fixed action
set and receives reward feedback, and the reward function depends linearly on a few …

被引用次数：11 相关文章所有 7 个版本

[PDF] neurips.cc

An -Best-Arm Identification Algorithm for Fixed-Confidence and Beyond

M Jourdan, R Degenne… - Advances in Neural …, 2023 - proceedings.neurips.cc

We propose EB-TC $\varepsilon $, a novel sampling rule for $\varepsilon $-best arm
identification in stochastic bandits. It is the first instance of Top Two algorithm analyzed for …

被引用次数：6 相关文章所有 11 个版本

[PDF] mlr.press

Revisiting simple regret: Fast rates for returning a good arm

Y Zhao, C Stephens, C Szepesvári… - … on Machine Learning, 2023 - proceedings.mlr.press

Simple regret is a natural and parameter-free performance criterion for pure exploration in
multi-armed bandits yet is less popular than the probability of missing the best arm or an …

被引用次数：9 相关文章所有 6 个版本

[PDF] neurips.cc

A framework for multi-a (rmed)/b (andit) testing with online fdr control

F Yang, A Ramdas, KG Jamieson… - Advances in Neural …, 2017 - proceedings.neurips.cc

We propose an alternative framework to existing setups for controlling false alarms when
multiple A/B tests are run over time. This setup arises in many practical applications, eg …

被引用次数：72 相关文章所有 8 个版本

[PDF] neurips.cc

Active learning with safety constraints

R Camilleri, A Wagenmaker… - Advances in …, 2022 - proceedings.neurips.cc

Active learning methods have shown great promise in reducing the number of samples
necessary for learning. As automated learning systems are adopted into real-time, real …

被引用次数：14 相关文章所有 6 个版本

[PDF] neurips.cc

Non-asymptotic analysis of a ucb-based top two algorithm

M Jourdan, R Degenne - Advances in Neural Information …, 2024 - proceedings.neurips.cc

A Top Two sampling rule for bandit identification is a method which selects the next arm to
sample from among two candidate arms, a leader and a challenger. Due to their simplicity …

被引用次数：7 相关文章所有 9 个版本