Introduction to multi-armed bandits
A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …
decisions over time under uncertainty. An enormous body of work has accumulated over the …
Achieving fairness in the stochastic multi-armed bandit problem
We study an interesting variant of the stochastic multi-armed bandit problem, which we call
the Fair-MAB problem, where, in addition to the objective of maximizing the sum of expected …
the Fair-MAB problem, where, in addition to the objective of maximizing the sum of expected …
No-regret learning in time-varying zero-sum games
Learning from repeated play in a fixed two-player zero-sum game is a classic problem in
game theory and online learning. We consider a variant of this problem where the game …
game theory and online learning. We consider a variant of this problem where the game …
A unifying framework for online optimization with long-term constraints
We study online learning problems in which a decision maker has to take a sequence of
decisions subject to $ m $ long-term constraints. The goal of the decision maker is to …
decisions subject to $ m $ long-term constraints. The goal of the decision maker is to …
[PDF][PDF] No-regret learning in bilateral trade via global budget balance
Bilateral trade models the problem of intermediating between two rational agents—a seller
and a buyer—both characterized by a private valuation for an item they want to trade. We …
and a buyer—both characterized by a private valuation for an item they want to trade. We …
Versatile dueling bandits: Best-of-both world analyses for learning from relative preferences
A Saha, P Gaillard - International Conference on Machine …, 2022 - proceedings.mlr.press
We study the problem of $ K $-armed dueling bandit for both stochastic and adversarial
environments, where the goal of the learner is to aggregate information through relative …
environments, where the goal of the learner is to aggregate information through relative …
Learning equilibria in matching markets from bandit feedback
Large-scale, two-sided matching platforms must find market outcomes that align with user
preferences while simultaneously learning these preferences from data. But since …
preferences while simultaneously learning these preferences from data. But since …
Learning to bid in repeated first-price auctions with budgets
Budget management strategies in repeated auctions have received growing attention in
online advertising markets. However, previous work on budget management in online …
online advertising markets. However, previous work on budget management in online …
Autobidders with budget and roi constraints: Efficiency, regret, and pacing dynamics
We study a game between autobidding algorithms that compete in an online advertising
platform. Each autobidder is tasked with maximizing its advertiser's total value over multiple …
platform. Each autobidder is tasked with maximizing its advertiser's total value over multiple …
Adversarial attacks on linear contextual bandits
Contextual bandit algorithms are applied in a wide range of domains, from advertising to
recommender systems, from clinical trials to education. In many of these domains, malicious …
recommender systems, from clinical trials to education. In many of these domains, malicious …