Interior-point methods for full-information and bandit online learning

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

被引用次数：2852 相关文章所有 9 个版本

[PDF] mlr.press

Online learning with predictable sequences

A Rakhlin, K Sridharan - Conference on Learning Theory, 2013 - proceedings.mlr.press

We present methods for online linear optimization that take advantage of benign (as
opposed to worst-case) sequences. Specifically if the sequence encountered by the learner …

被引用次数：355 相关文章所有 16 个版本

[PDF] mlr.press

Corralling a band of bandit algorithms

A Agarwal, H Luo, B Neyshabur… - … on Learning Theory, 2017 - proceedings.mlr.press

We study the problem of combining multiple bandit algorithms (that is, online learning
algorithms with partial feedback) with the goal of creating a master algorithm that performs …

被引用次数：176 相关文章所有 6 个版本

[PDF] jmlr.org

[PDF][PDF] Trading regret for efficiency: online convex optimization with long term constraints

M Mahdavi, R Jin, T Yang - The Journal of Machine Learning Research, 2012 - jmlr.org

In this paper we propose efficient algorithms for solving constrained online convex
optimization problems. Our motivation stems from the observation that most algorithms …

被引用次数：275 相关文章所有 11 个版本

[PDF] umd.edu

Online convex optimization with time-varying constraints and bandit feedback

X Cao, KJR Liu - IEEE Transactions on automatic control, 2018 - ieeexplore.ieee.org

In this paper, online convex optimization problem with time-varying constraints is studied
from the perspective of an agent taking sequential actions. Both the objective function and …

被引用次数：98 相关文章所有 3 个版本

[PDF] arxiv.org

Distributed bandit online convex optimization with time-varying coupled inequality constraints

X Yi, X Li, T Yang, L Xie, T Chai… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org

Distributed bandit online convex optimization with time-varying coupled inequality
constraints is considered, motivated by a repeated game between a group of learners and …

被引用次数：75 相关文章所有 7 个版本

[PDF] neurips.cc

Simultaneously learning stochastic and adversarial episodic mdps with known transition

T Jin, H Luo - Advances in neural information processing …, 2020 - proceedings.neurips.cc

This work studies the problem of learning episodic Markov Decision Processes with known
transition and bandit feedback. We develop the first algorithm with a``best-of-both …

被引用次数：62 相关文章所有 6 个版本

[PDF] mlr.press

Efficient projection-free online convex optimization with membership oracle

Z Mhammedi - Conference on Learning Theory, 2022 - proceedings.mlr.press

In constrained convex optimization, existing interior point methods do not scale well with the
dimension of the ambient space. Alternative approaches such as Projected Gradient …

被引用次数：28 相关文章所有 3 个版本

[PDF] mlr.press

The price of differential privacy for online learning

N Agarwal, K Singh - International Conference on Machine …, 2017 - proceedings.mlr.press

We design differentially private algorithms for the problem of online linear optimization in the
full information and bandit settings with optimal $ O (T^{0.5}) $ regret bounds. In the full …

被引用次数：100 相关文章所有 8 个版本

[PDF] neurips.cc

Bias no more: high-probability data-dependent regret bounds for adversarial bandits and mdps

CW Lee, H Luo, CY Wei… - Advances in neural …, 2020 - proceedings.neurips.cc

We develop a new approach to obtaining high probability regret bounds for online learning
with bandit feedback against an adaptive adversary. While existing approaches all require …

被引用次数：58 相关文章所有 6 个版本