A tutorial on thompson sampling

DJ Russo, B Van Roy, A Kazerouni… - … and Trends® in …, 2018 - nowpublishers.com
Thompson sampling is an algorithm for online decision problems where actions are taken
sequentially in a manner that must balance between exploiting what is known to maximize …

[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

Confidence intervals for policy evaluation in adaptive experiments

V Hadad, DA Hirshberg, R Zhan… - Proceedings of the …, 2021 - National Acad Sciences
Adaptive experimental designs can dramatically improve efficiency in randomized trials. But
with adaptively collected data, common estimators based on sample means and inverse …

Adaptive treatment assignment in experiments for policy choice

M Kasy, A Sautmann - Econometrica, 2021 - Wiley Online Library
Standard experimental designs are geared toward point estimation and hypothesis testing,
while bandit algorithms are geared toward in‐sample outcomes. Here, we instead consider …

Top two algorithms revisited

M Jourdan, R Degenne, D Baudry… - Advances in …, 2022 - proceedings.neurips.cc
Top two algorithms arose as an adaptation of Thompson sampling to best arm identification
in multi-armed bandit models for parametric families of arms. They select the next arm to …

Bayesian decision-making under misspecified priors with applications to meta-learning

M Simchowitz, C Tosh… - Advances in …, 2021 - proceedings.neurips.cc
Thompson sampling and other Bayesian sequential decision-making algorithms are among
the most popular approaches to tackle explore/exploit trade-offs in (contextual) bandits. The …

Real-time digital twin-based optimization with predictive simulation learning

T Goodwin, J Xu, N Celik, CH Chen - Journal of Simulation, 2024 - Taylor & Francis
Digital twinning presents an exciting opportunity enabling real-time optimization of the
control and operations of cyber-physical systems (CPS) with data-driven simulations, while …

Improving the expected improvement algorithm

C Qin, D Klabjan, D Russo - Advances in Neural …, 2017 - proceedings.neurips.cc
The expected improvement (EI) algorithm is a popular strategy for information collection in
optimization under uncertainty. The algorithm is widely known to be too greedy, but …

An adaptive targeted field experiment: Job search assistance for refugees in Jordan

AS Caria, G Gordon, M Kasy, S Quinn… - Journal of the …, 2024 - academic.oup.com
We introduce an adaptive targeted treatment assignment methodology for field experiments.
Our Tempered Thompson Algorithm balances the goals of maximizing the precision of …

Fast pure exploration via frank-wolfe

PA Wang, RC Tzeng… - Advances in Neural …, 2021 - proceedings.neurips.cc
We study the problem of active pure exploration with fixed confidence in generic stochastic
bandit environments. The goal of the learner is to answer a query about the environment …