Simple bayesian algorithms for best arm identification

DJ Russo, B Van Roy, A Kazerouni… - … and Trends® in …, 2018 - nowpublishers.com

Thompson sampling is an algorithm for online decision problems where actions are taken
sequentially in a manner that must balance between exploiting what is known to maximize …

被引用次数：1272 相关文章所有 34 个版本

[PDF] tor-lattimore.com

[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

被引用次数：3240 相关文章所有 9 个版本

[PDF] pnas.org Full View

Confidence intervals for policy evaluation in adaptive experiments

V Hadad, DA Hirshberg, R Zhan… - Proceedings of the …, 2021 - National Acad Sciences

Adaptive experimental designs can dramatically improve efficiency in randomized trials. But
with adaptively collected data, common estimators based on sample means and inverse …

被引用次数：185 相关文章所有 10 个版本

[PDF] ssrn.com

Adaptive treatment assignment in experiments for policy choice

M Kasy, A Sautmann - Econometrica, 2021 - Wiley Online Library

Standard experimental designs are geared toward point estimation and hypothesis testing,
while bandit algorithms are geared toward in‐sample outcomes. Here, we instead consider …

被引用次数：176 相关文章所有 20 个版本

[PDF] neurips.cc

Top two algorithms revisited

M Jourdan, R Degenne, D Baudry… - Advances in …, 2022 - proceedings.neurips.cc

Top two algorithms arose as an adaptation of Thompson sampling to best arm identification
in multi-armed bandit models for parametric families of arms. They select the next arm to …

被引用次数：49 相关文章所有 12 个版本

[PDF] neurips.cc

Bayesian decision-making under misspecified priors with applications to meta-learning

M Simchowitz, C Tosh… - Advances in …, 2021 - proceedings.neurips.cc

Thompson sampling and other Bayesian sequential decision-making algorithms are among
the most popular approaches to tackle explore/exploit trade-offs in (contextual) bandits. The …

被引用次数：58 相关文章所有 7 个版本

Real-time digital twin-based optimization with predictive simulation learning

T Goodwin, J Xu, N Celik, CH Chen - Journal of Simulation, 2024 - Taylor & Francis

Digital twinning presents an exciting opportunity enabling real-time optimization of the
control and operations of cyber-physical systems (CPS) with data-driven simulations, while …

被引用次数：64 相关文章所有 6 个版本

[PDF] neurips.cc

Improving the expected improvement algorithm

C Qin, D Klabjan, D Russo - Advances in Neural …, 2017 - proceedings.neurips.cc

The expected improvement (EI) algorithm is a popular strategy for information collection in
optimization under uncertainty. The algorithm is widely known to be too greedy, but …

被引用次数：174 相关文章所有 7 个版本

[PDF] escholarship.org

An adaptive targeted field experiment: Job search assistance for refugees in Jordan

AS Caria, G Gordon, M Kasy, S Quinn… - Journal of the …, 2024 - academic.oup.com

We introduce an adaptive targeted treatment assignment methodology for field experiments.
Our Tempered Thompson Algorithm balances the goals of maximizing the precision of …

被引用次数：78 相关文章所有 29 个版本

[PDF] neurips.cc

Fast pure exploration via frank-wolfe

PA Wang, RC Tzeng… - Advances in Neural …, 2021 - proceedings.neurips.cc

We study the problem of active pure exploration with fixed confidence in generic stochastic
bandit environments. The goal of the learner is to answer a query about the environment …

被引用次数：49 相关文章所有 9 个版本