Beyond ucb: Optimal and efficient contextual bandits with regression oracles
A fundamental challenge in contextual bandits is to develop flexible, general-purpose
algorithms with computational requirements no worse than classical supervised learning …
algorithms with computational requirements no worse than classical supervised learning …
Adapting to misspecification in contextual bandits
A major research direction in contextual bandits is to develop algorithms that are
computationally efficient, yet support flexible, general-purpose function approximation …
computationally efficient, yet support flexible, general-purpose function approximation …
Smoothed online learning is as easy as statistical learning
Much of modern learning theory has been split between two regimes: the classical offline
setting, where data arrive independently, and the online setting, where data arrive …
setting, where data arrive independently, and the online setting, where data arrive …
Optimal dynamic regret in exp-concave online learning
We consider the problem of the Zinkevich (2003)-style dynamic regret minimization in online
learning with\emph {exp-concave} losses. We show that whenever improper learning is …
learning with\emph {exp-concave} losses. We show that whenever improper learning is …
Optimal dynamic regret in proper online learning with strongly convex losses and beyond
We study the framework of universal dynamic regret minimization with strongly convex
losses. We answer an open problem in Baby and Wang 2021 by showing that in a proper …
losses. We answer an open problem in Baby and Wang 2021 by showing that in a proper …
Unconstrained dynamic regret via sparse coding
Z Zhang, A Cutkosky… - Advances in Neural …, 2024 - proceedings.neurips.cc
Motivated by the challenge of nonstationarity in sequential decision making, we study Online
Convex Optimization (OCO) under the coupling of two problem structures: the domain is …
Convex Optimization (OCO) under the coupling of two problem structures: the domain is …
Contextual bandits with smooth regret: Efficient learning in continuous action spaces
Designing efficient general-purpose contextual bandit algorithms that work with large—or
even infinite—action spaces would facilitate application to important scenarios such as …
even infinite—action spaces would facilitate application to important scenarios such as …
Learning to bid optimally and efficiently in adversarial first-price auctions
First-price auctions have very recently swept the online advertising industry, replacing
second-price auctions as the predominant auction mechanism on many platforms. This shift …
second-price auctions as the predominant auction mechanism on many platforms. This shift …
Chaining meets chain rule: Multilevel entropic regularization and training of neural networks
We derive generalization and excess risk bounds for neural networks using a family of
complexity measures based on a multilevel relative entropy. The bounds are obtained by …
complexity measures based on a multilevel relative entropy. The bounds are obtained by …
Online label shift: Optimal dynamic regret meets practical algorithms
This paper focuses on supervised and unsupervised online label shift, where the class
marginals $ Q (y) $ variesbut the class-conditionals $ Q (x| y) $ remain invariant. In the …
marginals $ Q (y) $ variesbut the class-conditionals $ Q (x| y) $ remain invariant. In the …