Eluder dimension and the sample complexity of optimistic exploration
This paper considers the sample complexity of the multi-armed bandit with dependencies
among the arms. Some of the most successful algorithms for this problem use the principle …
among the arms. Some of the most successful algorithms for this problem use the principle …
Learning to optimize via posterior sampling
This paper considers the use of a simple posterior sampling algorithm to balance between
exploration and exploitation when learning to optimize actions such as in multiarmed bandit …
exploration and exploitation when learning to optimize actions such as in multiarmed bandit …
[PDF][PDF] X-Armed Bandits.
We consider a generalization of stochastic bandits where the set of arms, X, is allowed to be
a generic measurable space and the mean-payoff function is “locally Lipschitz” with respect …
a generic measurable space and the mean-payoff function is “locally Lipschitz” with respect …
Minimal exploration in structured stochastic bandits
R Combes, S Magureanu… - Advances in Neural …, 2017 - proceedings.neurips.cc
This paper introduces and addresses a wide class of stochastic bandit problems where the
function mapping the arm to the corresponding reward exhibits some known structural …
function mapping the arm to the corresponding reward exhibits some known structural …
Unimodal bandits: Regret lower bounds and optimal algorithms
R Combes, A Proutiere - International Conference on …, 2014 - proceedings.mlr.press
We consider stochastic multi-armed bandits where the expected reward is a unimodal
function over partially ordered arms. This important class of problems has been recently …
function over partially ordered arms. This important class of problems has been recently …
Information complexity in bandit subset selection
E Kaufmann… - Conference on Learning …, 2013 - proceedings.mlr.press
We consider the problem of efficiently exploring the arms of a stochastic bandit to identify the
best subset. Under the PAC and the fixed-budget formulations, we derive improved bounds …
best subset. Under the PAC and the fixed-budget formulations, we derive improved bounds …
Thompson sampling and approximate inference
M Phan, Y Abbasi Yadkori… - Advances in Neural …, 2019 - proceedings.neurips.cc
We study the effects of approximate inference on the performance of Thompson sampling in
the $ k $-armed bandit problems. Thompson sampling is a successful algorithm for online …
the $ k $-armed bandit problems. Thompson sampling is a successful algorithm for online …
An improved parametrization and analysis of the EXP3++ algorithm for stochastic and adversarial bandits
We present a new strategy for gap estimation in randomized algorithms for multiarmed
bandits and combine it with the EXP3++ algorithm of Seldin and Slivkins (2014). In the …
bandits and combine it with the EXP3++ algorithm of Seldin and Slivkins (2014). In the …
Analysis of thompson sampling for the multi-armed bandit problem
The multi-armed bandit problem is a popular model for studying exploration/exploitation
trade-off in sequential decision problems. Many algorithms are now available for this well …
trade-off in sequential decision problems. Many algorithms are now available for this well …
[PDF][PDF] Better Algorithms for Benign Bandits.
The online multi-armed bandit problem and its generalizations are repeated decision
making problems, where the goal is to select one of several possible decisions in every …
making problems, where the goal is to select one of several possible decisions in every …