Multi-armed bandit experimental design: Online decision-making and adaptive inference
D Simchi-Levi, C Wang - International Conference on …, 2023 - proceedings.mlr.press
Multi-armed bandit has been well-known for its efficiency in online decision-making in terms
of minimizing the loss of the participants' welfare during experiments (ie, the regret). In …
of minimizing the loss of the participants' welfare during experiments (ie, the regret). In …
Thompson sampling with less exploration is fast and optimal
Abstract We propose $\epsilon $-Exploring Thompson Sampling ($\epsilon $-TS), a
modified version of the Thompson Sampling (TS) algorithm for multi-armed bandits. In …
modified version of the Thompson Sampling (TS) algorithm for multi-armed bandits. In …
Finite-time regret of thompson sampling algorithms for exponential family multi-armed bandits
We study the regret of Thompson sampling (TS) algorithms for exponential family bandits,
where the reward distribution is from a one-dimensional exponential family, which covers …
where the reward distribution is from a one-dimensional exponential family, which covers …
Double explore-then-commit: Asymptotic optimality and beyond
We study the multi-armed bandit problem with subGaussian rewards. The explore-then-
commit (ETC) strategy, which consists of an exploration phase followed by an exploitation …
commit (ETC) strategy, which consists of an exploration phase followed by an exploitation …
Optimal batched best arm identification
We study the batched best arm identification (BBAI) problem, where the learner's goal is to
identify the best arm while switching the policy as less as possible. In particular, we aim to …
identify the best arm while switching the policy as less as possible. In particular, we aim to …
Learning for crowdsourcing: Online dispatch for video analytics with guarantee
Crowdsourcing enables a paradigm to conduct the manual annotation and the analytics by
those recruited workers, with their rewards relevant to the quality of the results. Existing …
those recruited workers, with their rewards relevant to the quality of the results. Existing …
Efficient and robust sequential decision making algorithms
P Xu - AI Magazine, 2024 - Wiley Online Library
Sequential decision‐making involves making informed decisions based on continuous
interactions with a complex environment. This process is ubiquitous in various applications …
interactions with a complex environment. This process is ubiquitous in various applications …
Cooperative multi-agent bandits: Distributed algorithms with optimal individual regret and constant communication costs
Recently, there has been extensive study of cooperative multi-agent multi-armed bandits
where a set of distributed agents cooperatively play the same multi-armed bandit game. The …
where a set of distributed agents cooperatively play the same multi-armed bandit game. The …
Blockchain-enabled Multiple Sensitive Task-offloading Mechanism for MEC Applications
As mobile devices proliferate and mobile applications diversify, Mobile Edge Computing
(MEC) has become widely adopted to efficiently allocate computing resources at the network …
(MEC) has become widely adopted to efficiently allocate computing resources at the network …
A Batch Sequential Halving Algorithm without Performance Degradation
In this paper, we investigate the problem of pure exploration in the context of multi-armed
bandits, with a specific focus on scenarios where arms are pulled in fixed-size batches …
bandits, with a specific focus on scenarios where arms are pulled in fixed-size batches …