Eluder dimension and the sample complexity of optimistic exploration

D Russo, B Van Roy - Advances in Neural Information …, 2013 - proceedings.neurips.cc
This paper considers the sample complexity of the multi-armed bandit with dependencies
among the arms. Some of the most successful algorithms for this problem use the principle …

Learning to optimize via posterior sampling

D Russo, B Van Roy - Mathematics of Operations Research, 2014 - pubsonline.informs.org
This paper considers the use of a simple posterior sampling algorithm to balance between
exploration and exploitation when learning to optimize actions such as in multiarmed bandit …

[PDF][PDF] X-Armed Bandits.

S Bubeck, R Munos, G Stoltz, C Szepesvári - Journal of Machine Learning …, 2011 - jmlr.org
We consider a generalization of stochastic bandits where the set of arms, X, is allowed to be
a generic measurable space and the mean-payoff function is “locally Lipschitz” with respect …

Minimal exploration in structured stochastic bandits

R Combes, S Magureanu… - Advances in Neural …, 2017 - proceedings.neurips.cc
This paper introduces and addresses a wide class of stochastic bandit problems where the
function mapping the arm to the corresponding reward exhibits some known structural …

Unimodal bandits: Regret lower bounds and optimal algorithms

R Combes, A Proutiere - International Conference on …, 2014 - proceedings.mlr.press
We consider stochastic multi-armed bandits where the expected reward is a unimodal
function over partially ordered arms. This important class of problems has been recently …

Information complexity in bandit subset selection

E Kaufmann… - Conference on Learning …, 2013 - proceedings.mlr.press
We consider the problem of efficiently exploring the arms of a stochastic bandit to identify the
best subset. Under the PAC and the fixed-budget formulations, we derive improved bounds …

Thompson sampling and approximate inference

M Phan, Y Abbasi Yadkori… - Advances in Neural …, 2019 - proceedings.neurips.cc
We study the effects of approximate inference on the performance of Thompson sampling in
the $ k $-armed bandit problems. Thompson sampling is a successful algorithm for online …

An improved parametrization and analysis of the EXP3++ algorithm for stochastic and adversarial bandits

Y Seldin, G Lugosi - Conference on Learning Theory, 2017 - proceedings.mlr.press
We present a new strategy for gap estimation in randomized algorithms for multiarmed
bandits and combine it with the EXP3++ algorithm of Seldin and Slivkins (2014). In the …

Analysis of thompson sampling for the multi-armed bandit problem

S Agrawal, N Goyal - Conference on learning theory, 2012 - proceedings.mlr.press
The multi-armed bandit problem is a popular model for studying exploration/exploitation
trade-off in sequential decision problems. Many algorithms are now available for this well …

[PDF][PDF] Better Algorithms for Benign Bandits.

E Hazan, S Kale - Journal of Machine Learning Research, 2011 - jmlr.org
The online multi-armed bandit problem and its generalizations are repeated decision
making problems, where the goal is to select one of several possible decisions in every …