High-probability regret bounds for bandit online linear optimization

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com

Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

被引用次数：1145 相关文章所有 7 个版本

[PDF] nowpublishers.com

Regret analysis of stochastic and nonstochastic multi-armed bandit problems

S Bubeck, N Cesa-Bianchi - Foundations and Trends® in …, 2012 - nowpublishers.com

Multi-armed bandit problems are the most basic examples of sequential decision problems
with an exploration-exploitation trade-off. This is the balance between staying with the option …

被引用次数：3207 相关文章所有 26 个版本

[PDF] arxiv.org

Optimal rates for zero-order convex optimization: The power of two function evaluations

JC Duchi, MI Jordan, MJ Wainwright… - IEEE Transactions on …, 2015 - ieeexplore.ieee.org

We consider derivative-free algorithms for stochastic and nonstochastic convex optimization
problems that use only function values rather than gradients. Focusing on nonasymptotic …

被引用次数：524 相关文章所有 9 个版本

[PDF] arxiv.org

Making gradient descent optimal for strongly convex stochastic optimization

A Rakhlin, O Shamir, K Sridharan - arXiv preprint arXiv:1109.5647, 2011 - arxiv.org

Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic
optimization problems which arise in machine learning. For strongly convex problems, its …

被引用次数：834 相关文章所有 8 个版本

[PDF] mit.edu

[图书][B] Optimization for machine learning

S Sra, S Nowozin, SJ Wright - 2011 - books.google.com

An up-to-date account of the interplay between optimization and machine learning,
accessible to students and researchers in both communities. The interplay between …

被引用次数：1019 相关文章所有 33 个版本

[PDF] neurips.cc

Online convex optimization with stochastic constraints

H Yu, M Neely, X Wei - Advances in Neural Information …, 2017 - proceedings.neurips.cc

This paper considers online convex optimization (OCO) with stochastic constraints, which
generalizes Zinkevich's OCO over a known simple fixed set by introducing multiple …

被引用次数：230 相关文章所有 11 个版本

[PDF] neurips.cc

Bypassing the simulator: Near-optimal adversarial linear contextual bandits

H Liu, CY Wei, J Zimmert - Advances in Neural Information …, 2024 - proceedings.neurips.cc

We consider the adversarial linear contextual bandit problem, where the loss vectors are
selected fully adversarially and the per-round action set (ie the context) is drawn from a fixed …

被引用次数：11 相关文章所有 5 个版本

[PDF] psu.edu

[PDF][PDF] Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback.

A Agarwal, O Dekel, L Xiao - Colt, 2010 - Citeseer

Bandit convex optimization is a special case of online convex optimization with partial
information. In this setting, a player attempts to minimize a sequence of adversarially …

被引用次数：430 相关文章所有 7 个版本

[PDF] mlr.press

Dueling rl: Reinforcement learning with trajectory preferences

A Saha, A Pacchiano, J Lee - International Conference on …, 2023 - proceedings.mlr.press

We consider the problem of preference-based reinforcement learning (PbRL), where, unlike
traditional reinforcement learning (RL), an agent receives feedback only in terms of 1 bit …

被引用次数：30 相关文章

[PDF] sciencedirect.com

Combinatorial bandits

N Cesa-Bianchi, G Lugosi - Journal of Computer and System Sciences, 2012 - Elsevier

We study sequential prediction problems in which, at each time instance, the forecaster
chooses a vector from a given finite set S⊆ Rd. At the same time, the opponent chooses a …

被引用次数：527 相关文章所有 21 个版本