Introduction to multi-armed bandits

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

Achieving fairness in the stochastic multi-armed bandit problem

V Patil, G Ghalme, V Nair, Y Narahari - Journal of Machine Learning …, 2021 - jmlr.org
We study an interesting variant of the stochastic multi-armed bandit problem, which we call
the Fair-MAB problem, where, in addition to the objective of maximizing the sum of expected …

No-regret learning in time-varying zero-sum games

M Zhang, P Zhao, H Luo… - … Conference on Machine …, 2022 - proceedings.mlr.press
Learning from repeated play in a fixed two-player zero-sum game is a classic problem in
game theory and online learning. We consider a variant of this problem where the game …

A unifying framework for online optimization with long-term constraints

M Castiglioni, A Celli, A Marchesi… - Advances in Neural …, 2022 - proceedings.neurips.cc
We study online learning problems in which a decision maker has to take a sequence of
decisions subject to $ m $ long-term constraints. The goal of the decision maker is to …

[PDF][PDF] No-regret learning in bilateral trade via global budget balance

M Bernasconi, M Castiglioni, A Celli… - Proceedings of the 56th …, 2024 - dl.acm.org
Bilateral trade models the problem of intermediating between two rational agents—a seller
and a buyer—both characterized by a private valuation for an item they want to trade. We …

Versatile dueling bandits: Best-of-both world analyses for learning from relative preferences

A Saha, P Gaillard - International Conference on Machine …, 2022 - proceedings.mlr.press
We study the problem of $ K $-armed dueling bandit for both stochastic and adversarial
environments, where the goal of the learner is to aggregate information through relative …

Learning equilibria in matching markets from bandit feedback

M Jagadeesan, A Wei, Y Wang… - Advances in …, 2021 - proceedings.neurips.cc
Large-scale, two-sided matching platforms must find market outcomes that align with user
preferences while simultaneously learning these preferences from data. But since …

Learning to bid in repeated first-price auctions with budgets

Q Wang, Z Yang, X Deng… - … Conference on Machine …, 2023 - proceedings.mlr.press
Budget management strategies in repeated auctions have received growing attention in
online advertising markets. However, previous work on budget management in online …

Autobidders with budget and roi constraints: Efficiency, regret, and pacing dynamics

B Lucier, S Pattathil, A Slivkins… - The Thirty Seventh …, 2024 - proceedings.mlr.press
We study a game between autobidding algorithms that compete in an online advertising
platform. Each autobidder is tasked with maximizing its advertiser's total value over multiple …

Adversarial attacks on linear contextual bandits

E Garcelon, B Roziere, L Meunier… - Advances in …, 2020 - proceedings.neurips.cc
Contextual bandit algorithms are applied in a wide range of domains, from advertising to
recommender systems, from clinical trials to education. In many of these domains, malicious …