Robbing the bandit: Less regret in online geometric optimization against an adaptive adversary

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

被引用次数：3080 相关文章所有 9 个版本

[PDF] mit.edu

[图书][B] Optimization for machine learning

S Sra, S Nowozin, SJ Wright - 2011 - books.google.com

An up-to-date account of the interplay between optimization and machine learning,
accessible to students and researchers in both communities. The interplay between …

被引用次数：1019 相关文章所有 33 个版本

[PDF] mlr.press

Regret bounds for the adaptive control of linear quadratic systems

Y Abbasi-Yadkori, C Szepesvári - Proceedings of the 24th …, 2011 - proceedings.mlr.press

We study the average cost Linear Quadratic (LQ) control problem with unknown model
parameters, also known as the adaptive control problem in the control community. We …

被引用次数：445 相关文章所有 18 个版本

[PDF] arxiv.org

Linearly parameterized bandits

P Rusmevichientong… - Mathematics of Operations …, 2010 - pubsonline.informs.org

We consider bandit problems involving a large (possibly infinite) collection of arms, in which
the expected reward of each arm is a linear function of an r-dimensional random vector Z∈ …

被引用次数：626 相关文章所有 20 个版本

[PDF] psu.edu

[PDF][PDF] Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback.

A Agarwal, O Dekel, L Xiao - Colt, 2010 - Citeseer

Bandit convex optimization is a special case of online convex optimization with partial
information. In this setting, a player attempts to minimize a sequence of adversarially …

被引用次数：430 相关文章所有 7 个版本

[PDF] mlr.press

Contextual bandits with similarity information

A Slivkins - Proceedings of the 24th annual Conference On …, 2011 - proceedings.mlr.press

In a multi-armed bandit (MAB) problem, an online algorithm makes a sequence of choices.
In each round it chooses from a time-invariant set of alternatives and receives the payoff …

被引用次数：484 相关文章所有 12 个版本

[PDF] sciencedirect.com

Combinatorial bandits

N Cesa-Bianchi, G Lugosi - Journal of Computer and System Sciences, 2012 - Elsevier

We study sequential prediction problems in which, at each time instance, the forecaster
chooses a vector from a given finite set S⊆ Rd. At the same time, the opponent chooses a …

被引用次数：527 相关文章所有 21 个版本

[PDF] arxiv.org

Multi-armed bandits in metric spaces

R Kleinberg, A Slivkins, E Upfal - Proceedings of the fortieth annual ACM …, 2008 - dl.acm.org

In a multi-armed bandit problem, an online algorithm chooses from a set of strategies in a
sequence of n trials so as to maximize the total payoff of the chosen strategies. While the …

被引用次数：545 相关文章所有 15 个版本

[PDF] mlr.press

Contextual bandits with large action spaces: Made practical

Y Zhu, DJ Foster, J Langford… - … Conference on Machine …, 2022 - proceedings.mlr.press

A central problem in sequential decision making is to develop algorithms that are practical
and computationally efficient, yet support the use of flexible, general-purpose models …

被引用次数：32 相关文章所有 3 个版本

[PDF] psu.edu

[PDF][PDF] Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization.

JD Abernethy, E Hazan, A Rakhlin - COLT, 2008 - Citeseer

Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization Page 1 Competing
in the Dark: An Efficient Algorithm for Bandit Linear Optimization Jacob Abernethy Computer …

被引用次数：411 相关文章所有 13 个版本