A tutorial on thompson sampling
Thompson sampling is an algorithm for online decision problems where actions are taken
sequentially in a manner that must balance between exploiting what is known to maximize …
sequentially in a manner that must balance between exploiting what is known to maximize …
[图书][B] Bandit algorithms
T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …
and the multi-armed bandit model is a commonly used framework to address it. This …
Confidence intervals for policy evaluation in adaptive experiments
Adaptive experimental designs can dramatically improve efficiency in randomized trials. But
with adaptively collected data, common estimators based on sample means and inverse …
with adaptively collected data, common estimators based on sample means and inverse …
Adaptive treatment assignment in experiments for policy choice
M Kasy, A Sautmann - Econometrica, 2021 - Wiley Online Library
Standard experimental designs are geared toward point estimation and hypothesis testing,
while bandit algorithms are geared toward in‐sample outcomes. Here, we instead consider …
while bandit algorithms are geared toward in‐sample outcomes. Here, we instead consider …
Top two algorithms revisited
Top two algorithms arose as an adaptation of Thompson sampling to best arm identification
in multi-armed bandit models for parametric families of arms. They select the next arm to …
in multi-armed bandit models for parametric families of arms. They select the next arm to …
Bayesian decision-making under misspecified priors with applications to meta-learning
M Simchowitz, C Tosh… - Advances in …, 2021 - proceedings.neurips.cc
Thompson sampling and other Bayesian sequential decision-making algorithms are among
the most popular approaches to tackle explore/exploit trade-offs in (contextual) bandits. The …
the most popular approaches to tackle explore/exploit trade-offs in (contextual) bandits. The …
Real-time digital twin-based optimization with predictive simulation learning
Digital twinning presents an exciting opportunity enabling real-time optimization of the
control and operations of cyber-physical systems (CPS) with data-driven simulations, while …
control and operations of cyber-physical systems (CPS) with data-driven simulations, while …
Improving the expected improvement algorithm
The expected improvement (EI) algorithm is a popular strategy for information collection in
optimization under uncertainty. The algorithm is widely known to be too greedy, but …
optimization under uncertainty. The algorithm is widely known to be too greedy, but …
An adaptive targeted field experiment: Job search assistance for refugees in Jordan
We introduce an adaptive targeted treatment assignment methodology for field experiments.
Our Tempered Thompson Algorithm balances the goals of maximizing the precision of …
Our Tempered Thompson Algorithm balances the goals of maximizing the precision of …
Fast pure exploration via frank-wolfe
We study the problem of active pure exploration with fixed confidence in generic stochastic
bandit environments. The goal of the learner is to answer a query about the environment …
bandit environments. The goal of the learner is to answer a query about the environment …