Almost optimal anytime algorithm for batched multi-armed bandits

D Simchi-Levi, C Wang - International Conference on …, 2023 - proceedings.mlr.press

Multi-armed bandit has been well-known for its efficiency in online decision-making in terms
of minimizing the loss of the participants' welfare during experiments (ie, the regret). In …

被引用次数：34 相关文章所有 2 个版本

[PDF] mlr.press

Thompson sampling with less exploration is fast and optimal

T Jin, X Yang, X Xiao, P Xu - International Conference on …, 2023 - proceedings.mlr.press

Abstract We propose $\epsilon $-Exploring Thompson Sampling ($\epsilon $-TS), a
modified version of the Thompson Sampling (TS) algorithm for multi-armed bandits. In …

被引用次数：11 相关文章所有 4 个版本

[PDF] neurips.cc

Finite-time regret of thompson sampling algorithms for exponential family multi-armed bandits

T Jin, P Xu, X Xiao… - Advances in Neural …, 2022 - proceedings.neurips.cc

We study the regret of Thompson sampling (TS) algorithms for exponential family bandits,
where the reward distribution is from a one-dimensional exponential family, which covers …

被引用次数：15 相关文章所有 10 个版本

[PDF] mlr.press

Double explore-then-commit: Asymptotic optimality and beyond

T Jin, P Xu, X Xiao, Q Gu - Conference on Learning Theory, 2021 - proceedings.mlr.press

We study the multi-armed bandit problem with subGaussian rewards. The explore-then-
commit (ETC) strategy, which consists of an exploration phase followed by an exploitation …

被引用次数：25 相关文章所有 7 个版本

[PDF] arxiv.org

Optimal batched best arm identification

T Jin, Y Yang, J Tang, X Xiao, P Xu - arXiv preprint arXiv:2310.14129, 2023 - arxiv.org

We study the batched best arm identification (BBAI) problem, where the learner's goal is to
identify the best arm while switching the policy as less as possible. In particular, we aim to …

被引用次数：3 相关文章所有 2 个版本

[PDF] google.com

Learning for crowdsourcing: Online dispatch for video analytics with guarantee

Y Chen, S Zhang, Y Jin, Z Qian, M Xiao… - … -IEEE Conference on …, 2022 - ieeexplore.ieee.org

Crowdsourcing enables a paradigm to conduct the manual annotation and the analytics by
those recruited workers, with their rewards relevant to the quality of the results. Existing …

被引用次数：9 相关文章所有 5 个版本

[PDF] wiley.com Full View

Efficient and robust sequential decision making algorithms

P Xu - AI Magazine, 2024 - Wiley Online Library

Sequential decision‐making involves making informed decisions based on continuous
interactions with a complex environment. This process is ubiquitous in various applications …

Cooperative multi-agent bandits: Distributed algorithms with optimal individual regret and constant communication costs

L Yang, X Wang, M Hajiesmaili, L Zhang, J Lui… - arXiv preprint arXiv …, 2023 - arxiv.org

Recently, there has been extensive study of cooperative multi-agent multi-armed bandits
where a set of distributed agents cooperatively play the same multi-armed bandit game. The …

被引用次数：4 相关文章所有 2 个版本

Blockchain-enabled Multiple Sensitive Task-offloading Mechanism for MEC Applications

Y Xu, H Li, C Zhang, Z Tang, X Zhong… - IEEE Transactions …, 2024 - ieeexplore.ieee.org

As mobile devices proliferate and mobile applications diversify, Mobile Edge Computing
(MEC) has become widely adopted to efficiently allocate computing resources at the network …

A Batch Sequential Halving Algorithm without Performance Degradation

S Koyamada, S Nishimori, S Ishii - arXiv preprint arXiv:2406.00424, 2024 - arxiv.org

In this paper, we investigate the problem of pure exploration in the context of multi-armed
bandits, with a specific focus on scenarios where arms are pulled in fixed-size batches …