Multi-armed bandit experimental design: Online decision-making and adaptive inference

D Simchi-Levi, C Wang - International Conference on …, 2023 - proceedings.mlr.press
Multi-armed bandit has been well-known for its efficiency in online decision-making in terms
of minimizing the loss of the participants' welfare during experiments (ie, the regret). In …

Thompson sampling with less exploration is fast and optimal

T Jin, X Yang, X Xiao, P Xu - International Conference on …, 2023 - proceedings.mlr.press
Abstract We propose $\epsilon $-Exploring Thompson Sampling ($\epsilon $-TS), a
modified version of the Thompson Sampling (TS) algorithm for multi-armed bandits. In …

Finite-time regret of thompson sampling algorithms for exponential family multi-armed bandits

T Jin, P Xu, X Xiao… - Advances in Neural …, 2022 - proceedings.neurips.cc
We study the regret of Thompson sampling (TS) algorithms for exponential family bandits,
where the reward distribution is from a one-dimensional exponential family, which covers …

Double explore-then-commit: Asymptotic optimality and beyond

T Jin, P Xu, X Xiao, Q Gu - Conference on Learning Theory, 2021 - proceedings.mlr.press
We study the multi-armed bandit problem with subGaussian rewards. The explore-then-
commit (ETC) strategy, which consists of an exploration phase followed by an exploitation …

Optimal batched best arm identification

T Jin, Y Yang, J Tang, X Xiao, P Xu - arXiv preprint arXiv:2310.14129, 2023 - arxiv.org
We study the batched best arm identification (BBAI) problem, where the learner's goal is to
identify the best arm while switching the policy as less as possible. In particular, we aim to …

Learning for crowdsourcing: Online dispatch for video analytics with guarantee

Y Chen, S Zhang, Y Jin, Z Qian, M Xiao… - … -IEEE Conference on …, 2022 - ieeexplore.ieee.org
Crowdsourcing enables a paradigm to conduct the manual annotation and the analytics by
those recruited workers, with their rewards relevant to the quality of the results. Existing …

Efficient and robust sequential decision making algorithms

P Xu - AI Magazine, 2024 - Wiley Online Library
Sequential decision‐making involves making informed decisions based on continuous
interactions with a complex environment. This process is ubiquitous in various applications …

Cooperative multi-agent bandits: Distributed algorithms with optimal individual regret and constant communication costs

L Yang, X Wang, M Hajiesmaili, L Zhang, J Lui… - arXiv preprint arXiv …, 2023 - arxiv.org
Recently, there has been extensive study of cooperative multi-agent multi-armed bandits
where a set of distributed agents cooperatively play the same multi-armed bandit game. The …

Blockchain-enabled Multiple Sensitive Task-offloading Mechanism for MEC Applications

Y Xu, H Li, C Zhang, Z Tang, X Zhong… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
As mobile devices proliferate and mobile applications diversify, Mobile Edge Computing
(MEC) has become widely adopted to efficiently allocate computing resources at the network …

A Batch Sequential Halving Algorithm without Performance Degradation

S Koyamada, S Nishimori, S Ishii - arXiv preprint arXiv:2406.00424, 2024 - arxiv.org
In this paper, we investigate the problem of pure exploration in the context of multi-armed
bandits, with a specific focus on scenarios where arms are pulled in fixed-size batches …