Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization.

SCH Hoi, D Sahoo, J Lu, P Zhao - Neurocomputing, 2021 - Elsevier

Online learning represents a family of machine learning methods, where a learner attempts
to tackle some predictive (or any type of decision-making) task by learning from a sequence …

被引用次数：721 相关文章所有 6 个版本

[PDF] tor-lattimore.com

[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

被引用次数：2852 相关文章所有 9 个版本

[PDF] arxiv.org

A modern introduction to online learning

F Orabona - arXiv preprint arXiv:1912.13213, 2019 - arxiv.org

In this monograph, I introduce the basic concepts of Online Learning through a modern view
of Online Convex Optimization. Here, online learning refers to the framework of regret …

被引用次数：347 相关文章所有 3 个版本

[PDF] nowpublishers.com

Introduction to online convex optimization

E Hazan - Foundations and Trends® in Optimization, 2016 - nowpublishers.com

This monograph portrays optimization as a process. In many practical applications the
environment is so complex that it is infeasible to lay out a comprehensive theoretical model …

被引用次数：2002 相关文章所有 17 个版本

[PDF] nowpublishers.com

Regret analysis of stochastic and nonstochastic multi-armed bandit problems

S Bubeck, N Cesa-Bianchi - Foundations and Trends® in …, 2012 - nowpublishers.com

Multi-armed bandit problems are the most basic examples of sequential decision problems
with an exploration-exploitation trade-off. This is the balance between staying with the option …

被引用次数：3111 相关文章所有 26 个版本

[PDF] nowpublishers.com

Online learning and online convex optimization

S Shalev-Shwartz - Foundations and Trends® in Machine …, 2012 - nowpublishers.com

Online learning is a well established learning paradigm which has both theoretical and
practical appeals. The goal of online learning is to make a sequence of accurate predictions …

被引用次数：2495 相关文章所有 19 个版本

[PDF] nsf.gov

Online learning algorithms

N Cesa-Bianchi, F Orabona - Annual review of statistics and its …, 2021 - annualreviews.org

Online learning is a framework for the design and analysis of algorithms that build predictive
models by processing data one at the time. Besides being computationally efficient, online …

被引用次数：33 相关文章所有 6 个版本

[PDF] jmlr.org

[PDF][PDF] Adaptive subgradient methods for online learning and stochastic optimization.

J Duchi, E Hazan, Y Singer - Journal of machine learning research, 2011 - jmlr.org

We present a new family of subgradient methods that dynamically incorporate knowledge of
the geometry of the data observed in earlier iterations to perform more informative gradient …

被引用次数：13844 相关文章所有 25 个版本

[PDF] mit.edu

[图书][B] Optimization for machine learning

S Sra, S Nowozin, SJ Wright - 2011 - books.google.com

An up-to-date account of the interplay between optimization and machine learning,
accessible to students and researchers in both communities. The interplay between …

被引用次数：988 相关文章所有 33 个版本

[PDF] mlr.press

Learning adversarial markov decision processes with bandit feedback and unknown transition

C Jin, T Jin, H Luo, S Sra, T Yu - International Conference on …, 2020 - proceedings.mlr.press

We consider the task of learning in episodic finite-horizon Markov decision processes with
an unknown transition function, bandit feedback, and adversarial losses. We propose an …

被引用次数：97 相关文章所有 8 个版本