[图书][B] Bandit algorithms
T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …
and the multi-armed bandit model is a commonly used framework to address it. This …
[图书][B] Optimization for machine learning
An up-to-date account of the interplay between optimization and machine learning,
accessible to students and researchers in both communities. The interplay between …
accessible to students and researchers in both communities. The interplay between …
Regret bounds for the adaptive control of linear quadratic systems
Y Abbasi-Yadkori, C Szepesvári - Proceedings of the 24th …, 2011 - proceedings.mlr.press
We study the average cost Linear Quadratic (LQ) control problem with unknown model
parameters, also known as the adaptive control problem in the control community. We …
parameters, also known as the adaptive control problem in the control community. We …
Linearly parameterized bandits
P Rusmevichientong… - Mathematics of Operations …, 2010 - pubsonline.informs.org
We consider bandit problems involving a large (possibly infinite) collection of arms, in which
the expected reward of each arm is a linear function of an r-dimensional random vector Z∈ …
the expected reward of each arm is a linear function of an r-dimensional random vector Z∈ …
[PDF][PDF] Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback.
Bandit convex optimization is a special case of online convex optimization with partial
information. In this setting, a player attempts to minimize a sequence of adversarially …
information. In this setting, a player attempts to minimize a sequence of adversarially …
Contextual bandits with similarity information
A Slivkins - Proceedings of the 24th annual Conference On …, 2011 - proceedings.mlr.press
In a multi-armed bandit (MAB) problem, an online algorithm makes a sequence of choices.
In each round it chooses from a time-invariant set of alternatives and receives the payoff …
In each round it chooses from a time-invariant set of alternatives and receives the payoff …
Combinatorial bandits
N Cesa-Bianchi, G Lugosi - Journal of Computer and System Sciences, 2012 - Elsevier
We study sequential prediction problems in which, at each time instance, the forecaster
chooses a vector from a given finite set S⊆ Rd. At the same time, the opponent chooses a …
chooses a vector from a given finite set S⊆ Rd. At the same time, the opponent chooses a …
Multi-armed bandits in metric spaces
In a multi-armed bandit problem, an online algorithm chooses from a set of strategies in a
sequence of n trials so as to maximize the total payoff of the chosen strategies. While the …
sequence of n trials so as to maximize the total payoff of the chosen strategies. While the …
Contextual bandits with large action spaces: Made practical
A central problem in sequential decision making is to develop algorithms that are practical
and computationally efficient, yet support the use of flexible, general-purpose models …
and computationally efficient, yet support the use of flexible, general-purpose models …
[PDF][PDF] Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization.
Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization Page 1 Competing
in the Dark: An Efficient Algorithm for Bandit Linear Optimization Jacob Abernethy Computer …
in the Dark: An Efficient Algorithm for Bandit Linear Optimization Jacob Abernethy Computer …