Thompson sampling with less exploration is fast and optimal- 学术资源搜索

Thompson sampling with less exploration is fast and optimal

T Jin, X Yang, X Xiao, P Xu - International Conference on …, 2023 - proceedings.mlr.press

International Conference on Machine Learning, 2023•proceedings.mlr.press

Abstract We propose $\epsilon $-Exploring Thompson Sampling ($\epsilon $-TS), a
modified version of the Thompson Sampling (TS) algorithm for multi-armed bandits. In
$\epsilon $-TS, arms are selected greedily based on empirical mean rewards with
probability $1-\epsilon $, and based on posterior samples obtained from TS with probability
$\epsilon $. Here, $\epsilon\in (0, 1) $ is a user-defined constant. By reducing exploration,
$\epsilon $-TS improves computational efficiency compared to TS while achieving better …

Abstract

We propose -Exploring Thompson Sampling (-TS), a modified version of the Thompson Sampling (TS) algorithm for multi-armed bandits. In -TS, arms are selected greedily based on empirical mean rewards with probability , and based on posterior samples obtained from TS with probability . Here, is a user-defined constant. By reducing exploration, -TS improves computational efficiency compared to TS while achieving better regret bounds. We establish that -TS is both minimax optimal and asymptotically optimal for various popular reward distributions, including Gaussian, Bernoulli, Poisson, and Gamma. A key technical advancement in our analysis is the relaxation of the requirement for a stringent anti-concentration bound of the posterior distribution, which was necessary in recent analyses that achieved similar bounds. As a result, -TS maintains the posterior update structure of TS while minimizing alterations, such as clipping the sampling distribution or solving the inverse of the Kullback-Leibler (KL) divergence between reward distributions, as done in previous work. Furthermore, our algorithm is as easy to implement as TS, but operates significantly faster due to reduced exploration. Empirical evaluations confirm the efficiency and optimality of -TS.

proceedings.mlr.press

展开收起

被引用次数：12 相关文章所有 4 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果