A chaining algorithm for online nonparametric regression

D Foster, A Rakhlin - International Conference on Machine …, 2020 - proceedings.mlr.press

A fundamental challenge in contextual bandits is to develop flexible, general-purpose
algorithms with computational requirements no worse than classical supervised learning …

被引用次数：208 相关文章所有 6 个版本

[PDF] neurips.cc

Adapting to misspecification in contextual bandits

DJ Foster, C Gentile, M Mohri… - Advances in Neural …, 2020 - proceedings.neurips.cc

A major research direction in contextual bandits is to develop algorithms that are
computationally efficient, yet support flexible, general-purpose function approximation …

被引用次数：102 相关文章所有 9 个版本

[PDF] mlr.press

Smoothed online learning is as easy as statistical learning

A Block, Y Dagan, N Golowich… - … on Learning Theory, 2022 - proceedings.mlr.press

Much of modern learning theory has been split between two regimes: the classical offline
setting, where data arrive independently, and the online setting, where data arrive …

被引用次数：44 相关文章所有 5 个版本

[PDF] mlr.press

Optimal dynamic regret in exp-concave online learning

D Baby, YX Wang - Conference on Learning Theory, 2021 - proceedings.mlr.press

We consider the problem of the Zinkevich (2003)-style dynamic regret minimization in online
learning with\emph {exp-concave} losses. We show that whenever improper learning is …

被引用次数：44 相关文章所有 5 个版本

[PDF] mlr.press

Optimal dynamic regret in proper online learning with strongly convex losses and beyond

D Baby, YX Wang - International Conference on Artificial …, 2022 - proceedings.mlr.press

We study the framework of universal dynamic regret minimization with strongly convex
losses. We answer an open problem in Baby and Wang 2021 by showing that in a proper …

被引用次数：30 相关文章所有 4 个版本

[PDF] neurips.cc

Unconstrained dynamic regret via sparse coding

Z Zhang, A Cutkosky… - Advances in Neural …, 2024 - proceedings.neurips.cc

Motivated by the challenge of nonstationarity in sequential decision making, we study Online
Convex Optimization (OCO) under the coupling of two problem structures: the domain is …

被引用次数：8 相关文章所有 7 个版本

[PDF] mlr.press

Contextual bandits with smooth regret: Efficient learning in continuous action spaces

Y Zhu, P Mineiro - International Conference on Machine …, 2022 - proceedings.mlr.press

Designing efficient general-purpose contextual bandit algorithms that work with large—or
even infinite—action spaces would facilitate application to important scenarios such as …

被引用次数：15 相关文章所有 3 个版本

[PDF] arxiv.org

Learning to bid optimally and efficiently in adversarial first-price auctions

Y Han, Z Zhou, A Flores, E Ordentlich… - arXiv preprint arXiv …, 2020 - arxiv.org

First-price auctions have very recently swept the online advertising industry, replacing
second-price auctions as the predominant auction mechanism on many platforms. This shift …

被引用次数：39 相关文章所有 4 个版本

[PDF] jmlr.org

Chaining meets chain rule: Multilevel entropic regularization and training of neural networks

AR Asadi, E Abbe - Journal of Machine Learning Research, 2020 - jmlr.org

We derive generalization and excess risk bounds for neural networks using a family of
complexity measures based on a multilevel relative entropy. The bounds are obtained by …

被引用次数：35 相关文章所有 5 个版本

[PDF] neurips.cc

Online label shift: Optimal dynamic regret meets practical algorithms

D Baby, S Garg, TC Yen… - Advances in …, 2024 - proceedings.neurips.cc

This paper focuses on supervised and unsupervised online label shift, where the class
marginals $ Q (y) $ variesbut the class-conditionals $ Q (x| y) $ remain invariant. In the …

被引用次数：7 相关文章所有 5 个版本