Introduction to multi-armed bandits

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

Regret analysis of stochastic and nonstochastic multi-armed bandit problems

S Bubeck, N Cesa-Bianchi - Foundations and Trends® in …, 2012 - nowpublishers.com
Multi-armed bandit problems are the most basic examples of sequential decision problems
with an exploration-exploitation trade-off. This is the balance between staying with the option …

The best of both worlds: Stochastic and adversarial bandits

S Bubeck, A Slivkins - Conference on Learning Theory, 2012 - proceedings.mlr.press
We present a new bandit algorithm, SAO (Stochastic and Adversarial Optimal) whose regret
is (essentially) optimal both for adversarial rewards and for stochastic rewards. Specifically …

Sponsored search auctions: Recent advances and future directions

T Qin, W Chen, TY Liu - ACM Transactions on Intelligent Systems and …, 2015 - dl.acm.org
Sponsored search has been proven to be a successful business model, and sponsored
search auctions have become a hot research direction. There have been many exciting …

Learning prices for repeated auctions with strategic buyers

K Amin, A Rostamizadeh… - Advances in neural …, 2013 - proceedings.neurips.cc
Inspired by real-time ad exchanges for online display advertising, we consider the problem
of inferring a buyer's value distribution for a good when the buyer is repeatedly interacting …

Dynamic pricing with limited supply

M Babaioff, S Dughmi, R Kleinberg, A Slivkins - 2015 - dl.acm.org
We consider the problem of designing revenue-maximizing online posted-price mechanisms
when the seller has limited supply. A seller has k identical items for sale and is facing n …

Bandits and experts in metric spaces

R Kleinberg, A Slivkins, E Upfal - Journal of the ACM (JACM), 2019 - dl.acm.org
In a multi-armed bandit problem, an online algorithm chooses from a set of strategies in a
sequence of trials to maximize the total payoff of the chosen strategies. While the …

Bayesian incentive-compatible bandit exploration

Y Mansour, A Slivkins, V Syrgkanis - Proceedings of the Sixteenth ACM …, 2015 - dl.acm.org
Individual decision-makers consume information revealed by the previous decision makers,
and produce information that may help in future decision makers. This phenomenon is …

Bayesian exploration: Incentivizing exploration in Bayesian games

Y Mansour, A Slivkins, V Syrgkanis… - Operations …, 2022 - pubsonline.informs.org
We consider a ubiquitous scenario in the internet economy when individual decision makers
(henceforth, agents) both produce and consume information as they make strategic choices …

Characterizing truthful multi-armed bandit mechanisms

M Babaioff, Y Sharma, A Slivkins - … of the 10th ACM conference on …, 2009 - dl.acm.org
We consider a multi-round auction setting motivated by pay-per-click auctions for Internet
advertising. In each round the auctioneer selects an advertiser and shows her ad, which is …