[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

[图书][B] Optimization for machine learning

S Sra, S Nowozin, SJ Wright - 2011 - books.google.com
An up-to-date account of the interplay between optimization and machine learning,
accessible to students and researchers in both communities. The interplay between …

Regret bounds for the adaptive control of linear quadratic systems

Y Abbasi-Yadkori, C Szepesvári - Proceedings of the 24th …, 2011 - proceedings.mlr.press
We study the average cost Linear Quadratic (LQ) control problem with unknown model
parameters, also known as the adaptive control problem in the control community. We …

Linearly parameterized bandits

P Rusmevichientong… - Mathematics of Operations …, 2010 - pubsonline.informs.org
We consider bandit problems involving a large (possibly infinite) collection of arms, in which
the expected reward of each arm is a linear function of an r-dimensional random vector Z∈ …

[PDF][PDF] Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback.

A Agarwal, O Dekel, L Xiao - Colt, 2010 - Citeseer
Bandit convex optimization is a special case of online convex optimization with partial
information. In this setting, a player attempts to minimize a sequence of adversarially …

Contextual bandits with similarity information

A Slivkins - Proceedings of the 24th annual Conference On …, 2011 - proceedings.mlr.press
In a multi-armed bandit (MAB) problem, an online algorithm makes a sequence of choices.
In each round it chooses from a time-invariant set of alternatives and receives the payoff …

Combinatorial bandits

N Cesa-Bianchi, G Lugosi - Journal of Computer and System Sciences, 2012 - Elsevier
We study sequential prediction problems in which, at each time instance, the forecaster
chooses a vector from a given finite set S⊆ Rd. At the same time, the opponent chooses a …

Multi-armed bandits in metric spaces

R Kleinberg, A Slivkins, E Upfal - Proceedings of the fortieth annual ACM …, 2008 - dl.acm.org
In a multi-armed bandit problem, an online algorithm chooses from a set of strategies in a
sequence of n trials so as to maximize the total payoff of the chosen strategies. While the …

Contextual bandits with large action spaces: Made practical

Y Zhu, DJ Foster, J Langford… - … Conference on Machine …, 2022 - proceedings.mlr.press
A central problem in sequential decision making is to develop algorithms that are practical
and computationally efficient, yet support the use of flexible, general-purpose models …

[PDF][PDF] Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization.

JD Abernethy, E Hazan, A Rakhlin - COLT, 2008 - Citeseer
Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization Page 1 Competing
in the Dark: An Efficient Algorithm for Bandit Linear Optimization Jacob Abernethy Computer …