[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

Online learning with predictable sequences

A Rakhlin, K Sridharan - Conference on Learning Theory, 2013 - proceedings.mlr.press
We present methods for online linear optimization that take advantage of benign (as
opposed to worst-case) sequences. Specifically if the sequence encountered by the learner …

Corralling a band of bandit algorithms

A Agarwal, H Luo, B Neyshabur… - … on Learning Theory, 2017 - proceedings.mlr.press
We study the problem of combining multiple bandit algorithms (that is, online learning
algorithms with partial feedback) with the goal of creating a master algorithm that performs …

[PDF][PDF] Trading regret for efficiency: online convex optimization with long term constraints

M Mahdavi, R Jin, T Yang - The Journal of Machine Learning Research, 2012 - jmlr.org
In this paper we propose efficient algorithms for solving constrained online convex
optimization problems. Our motivation stems from the observation that most algorithms …

Online convex optimization with time-varying constraints and bandit feedback

X Cao, KJR Liu - IEEE Transactions on automatic control, 2018 - ieeexplore.ieee.org
In this paper, online convex optimization problem with time-varying constraints is studied
from the perspective of an agent taking sequential actions. Both the objective function and …

Distributed bandit online convex optimization with time-varying coupled inequality constraints

X Yi, X Li, T Yang, L Xie, T Chai… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Distributed bandit online convex optimization with time-varying coupled inequality
constraints is considered, motivated by a repeated game between a group of learners and …

Simultaneously learning stochastic and adversarial episodic mdps with known transition

T Jin, H Luo - Advances in neural information processing …, 2020 - proceedings.neurips.cc
This work studies the problem of learning episodic Markov Decision Processes with known
transition and bandit feedback. We develop the first algorithm with a``best-of-both …

Efficient projection-free online convex optimization with membership oracle

Z Mhammedi - Conference on Learning Theory, 2022 - proceedings.mlr.press
In constrained convex optimization, existing interior point methods do not scale well with the
dimension of the ambient space. Alternative approaches such as Projected Gradient …

The price of differential privacy for online learning

N Agarwal, K Singh - International Conference on Machine …, 2017 - proceedings.mlr.press
We design differentially private algorithms for the problem of online linear optimization in the
full information and bandit settings with optimal $ O (T^{0.5}) $ regret bounds. In the full …

Bias no more: high-probability data-dependent regret bounds for adversarial bandits and mdps

CW Lee, H Luo, CY Wei… - Advances in neural …, 2020 - proceedings.neurips.cc
We develop a new approach to obtaining high probability regret bounds for online learning
with bandit feedback against an adaptive adversary. While existing approaches all require …