Online multi-armed bandits with adaptive inference

M Dimakopoulou, Z Ren… - Advances in Neural …, 2021 - proceedings.neurips.cc
During online decision making in Multi-Armed Bandits (MAB), one needs to conduct
inference on the true mean reward of each arm based on data collected so far at each step …

Multi-armed bandit experimental design: Online decision-making and adaptive inference

D Simchi-Levi, C Wang - International Conference on …, 2023 - proceedings.mlr.press
Multi-armed bandit has been well-known for its efficiency in online decision-making in terms
of minimizing the loss of the participants' welfare during experiments (ie, the regret). In …

Are sample means in multi-armed bandits positively or negatively biased?

J Shin, A Ramdas, A Rinaldo - Advances in Neural …, 2019 - proceedings.neurips.cc
It is well known that in stochastic multi-armed bandits (MAB), the sample mean of an arm is
typically not an unbiased estimator of its true mean. In this paper, we decouple three …

A closer look at the worst-case behavior of multi-armed bandit algorithms

A Kalvit, A Zeevi - Advances in Neural Information …, 2021 - proceedings.neurips.cc
One of the key drivers of complexity in the classical (stochastic) multi-armed bandit (MAB)
problem is the difference between mean rewards in the top two arms, also known as the …

Causal bandits: Online decision-making in endogenous settings

J Zhang, Y Chen, A Singh - arXiv preprint arXiv:2211.08649, 2022 - arxiv.org
The deployment of Multi-Armed Bandits (MAB) has become commonplace in many
economic applications. However, regret guarantees for even state-of-the-art linear bandit …

Sub-sampling for multi-armed bandits

A Baransi, OA Maillard, S Mannor - … 15-19, 2014. Proceedings, Part I 14, 2014 - Springer
The stochastic multi-armed bandit problem is a popular model of the exploration/exploitation
trade-off in sequential decision problems. We introduce a novel algorithm that is based on …

Scaling multi-armed bandit algorithms

E Fouché, J Komiyama, K Böhm - Proceedings of the 25th ACM SIGKDD …, 2019 - dl.acm.org
The Multi-Armed Bandit (MAB) is a fundamental model capturing the dilemma between
exploration and exploitation in sequential decision making. At every time step, the decision …

Counterfactual data-fusion for online reinforcement learners

A Forney, J Pearl, E Bareinboim - … Conference on Machine …, 2017 - proceedings.mlr.press
Abstract The Multi-Armed Bandit problem with Unobserved Confounders (MABUC)
considers decision-making settings where unmeasured variables can influence both the …

Stochastic rising bandits

AM Metelli, F Trovo, M Pirola… - … Conference on Machine …, 2022 - proceedings.mlr.press
This paper is in the field of stochastic Multi-Armed Bandits (MABs), ie, those sequential
selection techniques able to learn online using only the feedback given by the chosen …

An experimental design for anytime-valid causal inference on multi-armed bandits

B Liang, I Bojinov - arXiv preprint arXiv:2311.05794, 2023 - arxiv.org
Typically, multi-armed bandit (MAB) experiments are analyzed at the end of the study and
thus require the analyst to specify a fixed sample size in advance. However, in many online …