Almost optimal anytime algorithm for batched multi-armed bandits

T Jin, J Tang, P Xu, K Huang… - … on Machine Learning, 2021 - proceedings.mlr.press
International Conference on Machine Learning, 2021proceedings.mlr.press
In batched multi-armed bandit problems, the learner can adaptively pull arms and adjust
strategy in batches. In many real applications, not only the regret but also the batch
complexity need to be optimized. Existing batched bandit algorithms usually assume that the
time horizon T is known in advance. However, many applications involve an unpredictable
stopping time. In this paper, we study the anytime batched multi-armed bandit problem. We
propose an anytime algorithm that achieves the asymptotically optimal regret for exponential …
Abstract
In batched multi-armed bandit problems, the learner can adaptively pull arms and adjust strategy in batches. In many real applications, not only the regret but also the batch complexity need to be optimized. Existing batched bandit algorithms usually assume that the time horizon T is known in advance. However, many applications involve an unpredictable stopping time. In this paper, we study the anytime batched multi-armed bandit problem. We propose an anytime algorithm that achieves the asymptotically optimal regret for exponential families of reward distributions with $ O (\log\log T\ilog^{\alpha}(T)) $\footnote {Notation\ilog^{\alpha}(T) is the result of iteratively applying the logarithm function on T for\alpha times, eg,\ilog^{3}(T)=\log\log\log T.} batches, where . Moreover, we prove that for any constant c> 0, no algorithm can achieve the asymptotically optimal regret within c\log\log T batches.
proceedings.mlr.press
以上显示的是最相近的搜索结果。 查看全部搜索结果