相关文章- 学术资源搜索

Online multi-armed bandits with adaptive inference

M Dimakopoulou, Z Ren… - Advances in Neural …, 2021 - proceedings.neurips.cc

During online decision making in Multi-Armed Bandits (MAB), one needs to conduct
inference on the true mean reward of each arm based on data collected so far at each step …

被引用次数：29 相关文章所有 8 个版本

[PDF] mlr.press

Multi-armed bandit experimental design: Online decision-making and adaptive inference

D Simchi-Levi, C Wang - International Conference on …, 2023 - proceedings.mlr.press

Multi-armed bandit has been well-known for its efficiency in online decision-making in terms
of minimizing the loss of the participants' welfare during experiments (ie, the regret). In …

被引用次数：24 相关文章所有 2 个版本

[PDF] neurips.cc

Are sample means in multi-armed bandits positively or negatively biased?

J Shin, A Ramdas, A Rinaldo - Advances in Neural …, 2019 - proceedings.neurips.cc

It is well known that in stochastic multi-armed bandits (MAB), the sample mean of an arm is
typically not an unbiased estimator of its true mean. In this paper, we decouple three …

被引用次数：43 相关文章所有 8 个版本

[PDF] neurips.cc

A closer look at the worst-case behavior of multi-armed bandit algorithms

A Kalvit, A Zeevi - Advances in Neural Information …, 2021 - proceedings.neurips.cc

One of the key drivers of complexity in the classical (stochastic) multi-armed bandit (MAB)
problem is the difference between mean rewards in the top two arms, also known as the …

被引用次数：29 相关文章所有 8 个版本

[PDF] arxiv.org

Causal bandits: Online decision-making in endogenous settings

J Zhang, Y Chen, A Singh - arXiv preprint arXiv:2211.08649, 2022 - arxiv.org

The deployment of Multi-Armed Bandits (MAB) has become commonplace in many
economic applications. However, regret guarantees for even state-of-the-art linear bandit …

被引用次数：6 相关文章所有 7 个版本

[PDF] hal.science

Sub-sampling for multi-armed bandits

A Baransi, OA Maillard, S Mannor - … 15-19, 2014. Proceedings, Part I 14, 2014 - Springer

The stochastic multi-armed bandit problem is a popular model of the exploration/exploitation
trade-off in sequential decision problems. We introduce a novel algorithm that is based on …

被引用次数：65 相关文章所有 10 个版本

[PDF] researchgate.net

Scaling multi-armed bandit algorithms

E Fouché, J Komiyama, K Böhm - Proceedings of the 25th ACM SIGKDD …, 2019 - dl.acm.org

The Multi-Armed Bandit (MAB) is a fundamental model capturing the dilemma between
exploration and exploitation in sequential decision making. At every time step, the decision …

被引用次数：32 相关文章所有 6 个版本

[PDF] mlr.press

Counterfactual data-fusion for online reinforcement learners

A Forney, J Pearl, E Bareinboim - … Conference on Machine …, 2017 - proceedings.mlr.press

Abstract The Multi-Armed Bandit problem with Unobserved Confounders (MABUC)
considers decision-making settings where unmeasured variables can influence both the …

被引用次数：81 相关文章所有 11 个版本

[PDF] mlr.press

Stochastic rising bandits

AM Metelli, F Trovo, M Pirola… - … Conference on Machine …, 2022 - proceedings.mlr.press

This paper is in the field of stochastic Multi-Armed Bandits (MABs), ie, those sequential
selection techniques able to learn online using only the feedback given by the chosen …

被引用次数：20 相关文章所有 7 个版本

[PDF] arxiv.org

An experimental design for anytime-valid causal inference on multi-armed bandits

B Liang, I Bojinov - arXiv preprint arXiv:2311.05794, 2023 - arxiv.org

Typically, multi-armed bandit (MAB) experiments are analyzed at the end of the study and
thus require the analyst to specify a fixed sample size in advance. However, in many online …

被引用次数：4 相关文章所有 3 个版本

Online multi-armed bandits with adaptive inference

Multi-armed bandit experimental design: Online decision-making and adaptive inference

Are sample means in multi-armed bandits positively or negatively biased?

A closer look at the worst-case behavior of multi-armed bandit algorithms

Causal bandits: Online decision-making in endogenous settings

Sub-sampling for multi-armed bandits

Scaling multi-armed bandit algorithms

Counterfactual data-fusion for online reinforcement learners

Stochastic rising bandits

An experimental design for anytime-valid causal inference on multi-armed bandits

相关搜索

高级搜索

引用