Introduction to multi-armed bandits
A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …
decisions over time under uncertainty. An enormous body of work has accumulated over the …
Regret analysis of stochastic and nonstochastic multi-armed bandit problems
S Bubeck, N Cesa-Bianchi - Foundations and Trends® in …, 2012 - nowpublishers.com
Multi-armed bandit problems are the most basic examples of sequential decision problems
with an exploration-exploitation trade-off. This is the balance between staying with the option …
with an exploration-exploitation trade-off. This is the balance between staying with the option …
The best of both worlds: Stochastic and adversarial bandits
S Bubeck, A Slivkins - Conference on Learning Theory, 2012 - proceedings.mlr.press
We present a new bandit algorithm, SAO (Stochastic and Adversarial Optimal) whose regret
is (essentially) optimal both for adversarial rewards and for stochastic rewards. Specifically …
is (essentially) optimal both for adversarial rewards and for stochastic rewards. Specifically …
Sponsored search auctions: Recent advances and future directions
Sponsored search has been proven to be a successful business model, and sponsored
search auctions have become a hot research direction. There have been many exciting …
search auctions have become a hot research direction. There have been many exciting …
Learning prices for repeated auctions with strategic buyers
K Amin, A Rostamizadeh… - Advances in neural …, 2013 - proceedings.neurips.cc
Inspired by real-time ad exchanges for online display advertising, we consider the problem
of inferring a buyer's value distribution for a good when the buyer is repeatedly interacting …
of inferring a buyer's value distribution for a good when the buyer is repeatedly interacting …
Dynamic pricing with limited supply
We consider the problem of designing revenue-maximizing online posted-price mechanisms
when the seller has limited supply. A seller has k identical items for sale and is facing n …
when the seller has limited supply. A seller has k identical items for sale and is facing n …
Bandits and experts in metric spaces
In a multi-armed bandit problem, an online algorithm chooses from a set of strategies in a
sequence of trials to maximize the total payoff of the chosen strategies. While the …
sequence of trials to maximize the total payoff of the chosen strategies. While the …
Bayesian incentive-compatible bandit exploration
Individual decision-makers consume information revealed by the previous decision makers,
and produce information that may help in future decision makers. This phenomenon is …
and produce information that may help in future decision makers. This phenomenon is …
Bayesian exploration: Incentivizing exploration in Bayesian games
We consider a ubiquitous scenario in the internet economy when individual decision makers
(henceforth, agents) both produce and consume information as they make strategic choices …
(henceforth, agents) both produce and consume information as they make strategic choices …
Characterizing truthful multi-armed bandit mechanisms
We consider a multi-round auction setting motivated by pay-per-click auctions for Internet
advertising. In each round the auctioneer selects an advertiser and shows her ad, which is …
advertising. In each round the auctioneer selects an advertiser and shows her ad, which is …