Introduction to multi-armed bandits
A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …
decisions over time under uncertainty. An enormous body of work has accumulated over the …
Dual mirror descent for online allocation problems
S Balseiro, H Lu, V Mirrokni - International Conference on …, 2020 - proceedings.mlr.press
We consider online allocation problems with concave revenue functions and resource
constraints, which are central problems in revenue management and online advertising. In …
constraints, which are central problems in revenue management and online advertising. In …
Bandits with knapsacks
Multi-armed bandit problems are the predominant theoretical model of exploration-
exploitation tradeoffs in learning, and they have countless applications ranging from medical …
exploitation tradeoffs in learning, and they have countless applications ranging from medical …
Online matching and ad allocation
A Mehta - … and Trends® in Theoretical Computer Science, 2013 - nowpublishers.com
Matching is a classic problem with a rich history and a significant impact, both on the theory
of algorithms and in practice. Recently there has been a surge of interest in the online …
of algorithms and in practice. Recently there has been a surge of interest in the online …
Online task assignment in crowdsourcing markets
We explore the problem of assigning heterogeneous tasks to workers with different,
unknown skill sets in crowdsourcing markets such as Amazon Mechanical Turk. We first …
unknown skill sets in crowdsourcing markets such as Amazon Mechanical Turk. We first …
Real-time bidding for online advertising: measurement and analysis
The real-time bidding (RTB), aka programmatic buying, has recently become the fastest
growing area in online advertising. Instead of bulking buying and inventory-centric buying …
growing area in online advertising. Instead of bulking buying and inventory-centric buying …
A dynamic near-optimal algorithm for online linear programming
A natural optimization model that formulates many online resource allocation problems is
the online linear programming (LP) problem in which the constraint matrix is revealed …
the online linear programming (LP) problem in which the constraint matrix is revealed …
Real-time optimization of personalized assortments
N Golrezaei, H Nazerzadeh… - Management …, 2014 - pubsonline.informs.org
Motivated by the availability of real-time data on customer characteristics, we consider the
problem of personalizing the assortment of products for each arriving customer. Using actual …
problem of personalizing the assortment of products for each arriving customer. Using actual …
Bandits with concave rewards and convex knapsacks
S Agrawal, NR Devanur - Proceedings of the fifteenth ACM conference …, 2014 - dl.acm.org
In this paper, we consider a very general model for exploration-exploitation tradeoff which
allows arbitrary concave rewards and convex constraints on the decisions across time, in …
allows arbitrary concave rewards and convex constraints on the decisions across time, in …
Adversarial bandits with knapsacks
We consider Bandits with Knapsacks (henceforth, BwK), a general model for multi-armed
bandits under supply/budget constraints. In particular, a bandit algorithm needs to solve a …
bandits under supply/budget constraints. In particular, a bandit algorithm needs to solve a …