[图书][B] Bandit algorithms
T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …
and the multi-armed bandit model is a commonly used framework to address it. This …
Online learning with predictable sequences
A Rakhlin, K Sridharan - Conference on Learning Theory, 2013 - proceedings.mlr.press
We present methods for online linear optimization that take advantage of benign (as
opposed to worst-case) sequences. Specifically if the sequence encountered by the learner …
opposed to worst-case) sequences. Specifically if the sequence encountered by the learner …
Corralling a band of bandit algorithms
We study the problem of combining multiple bandit algorithms (that is, online learning
algorithms with partial feedback) with the goal of creating a master algorithm that performs …
algorithms with partial feedback) with the goal of creating a master algorithm that performs …
[PDF][PDF] Trading regret for efficiency: online convex optimization with long term constraints
In this paper we propose efficient algorithms for solving constrained online convex
optimization problems. Our motivation stems from the observation that most algorithms …
optimization problems. Our motivation stems from the observation that most algorithms …
Online convex optimization with time-varying constraints and bandit feedback
In this paper, online convex optimization problem with time-varying constraints is studied
from the perspective of an agent taking sequential actions. Both the objective function and …
from the perspective of an agent taking sequential actions. Both the objective function and …
Distributed bandit online convex optimization with time-varying coupled inequality constraints
Distributed bandit online convex optimization with time-varying coupled inequality
constraints is considered, motivated by a repeated game between a group of learners and …
constraints is considered, motivated by a repeated game between a group of learners and …
Simultaneously learning stochastic and adversarial episodic mdps with known transition
This work studies the problem of learning episodic Markov Decision Processes with known
transition and bandit feedback. We develop the first algorithm with a``best-of-both …
transition and bandit feedback. We develop the first algorithm with a``best-of-both …
Efficient projection-free online convex optimization with membership oracle
Z Mhammedi - Conference on Learning Theory, 2022 - proceedings.mlr.press
In constrained convex optimization, existing interior point methods do not scale well with the
dimension of the ambient space. Alternative approaches such as Projected Gradient …
dimension of the ambient space. Alternative approaches such as Projected Gradient …
The price of differential privacy for online learning
We design differentially private algorithms for the problem of online linear optimization in the
full information and bandit settings with optimal $ O (T^{0.5}) $ regret bounds. In the full …
full information and bandit settings with optimal $ O (T^{0.5}) $ regret bounds. In the full …
Bias no more: high-probability data-dependent regret bounds for adversarial bandits and mdps
We develop a new approach to obtaining high probability regret bounds for online learning
with bandit feedback against an adaptive adversary. While existing approaches all require …
with bandit feedback against an adaptive adversary. While existing approaches all require …