Introduction to multi-armed bandits

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

Nonstochastic multi-armed bandits with graph-structured feedback

N Alon, N Cesa-Bianchi, C Gentile, S Mannor… - SIAM Journal on …, 2017 - SIAM
We introduce and study a partial-information model of online learning, where a decision
maker repeatedly chooses from a finite set of actions and observes some subset of the …

Optimal no-regret learning for one-sided lipschitz functions

P Dütting, G Guruganesh… - … on Machine Learning, 2023 - proceedings.mlr.press
Inspired by applications in pricing and contract design, we study the maximization of one-
sided Lipschitz functions, which only provide the (weaker) guarantee that they do not grow …

Fair contextual multi-armed bandits: Theory and experiments

Y Chen, A Cuellar, H Luo, J Modi… - … on Uncertainty in …, 2020 - proceedings.mlr.press
When an AI system interacts with multiple users, it frequently needs to make allocation
decisions. For instance, a virtual agent decides whom to pay attention to in a group, or a …

Contextual bandits with continuous actions: Smoothing, zooming, and adapting

A Krishnamurthy, J Langford, A Slivkins… - Journal of Machine …, 2020 - jmlr.org
We study contextual bandit learning with an abstract policy class and continuous action
space. We obtain two qualitatively different regret bounds: one competes with a smoothed …

Learning to bid optimally and efficiently in adversarial first-price auctions

Y Han, Z Zhou, A Flores, E Ordentlich… - arXiv preprint arXiv …, 2020 - arxiv.org
First-price auctions have very recently swept the online advertising industry, replacing
second-price auctions as the predominant auction mechanism on many platforms. This shift …

Optimal no-regret learning in repeated first-price auctions

Y Han, T Weissman, Z Zhou - Operations Research, 2024 - pubsonline.informs.org
We study online learning in repeated first-price auctions where a bidder, only observing the
winning bid at the end of each auction, learns to adaptively bid to maximize the cumulative …

Chaining meets chain rule: Multilevel entropic regularization and training of neural networks

AR Asadi, E Abbe - Journal of Machine Learning Research, 2020 - jmlr.org
We derive generalization and excess risk bounds for neural networks using a family of
complexity measures based on a multilevel relative entropy. The bounds are obtained by …

Contextual pricing for lipschitz buyers

J Mao, R Leme, J Schneider - Advances in Neural …, 2018 - proceedings.neurips.cc
We investigate the problem of learning a Lipschitz function from binary feedback. In this
problem, a learner is trying to learn a Lipschitz function $ f:[0, 1]^ d\rightarrow [0, 1] $ over …

Efficient contextual bandits with continuous actions

M Majzoubi, C Zhang, R Chari… - Advances in …, 2020 - proceedings.neurips.cc
We create a computationally tractable learning algorithm for contextual bandits with
continuous actions having unknown structure. The new reduction-style algorithm composes …