Thompson sampling: An asymptotically optimal finite-time analysis

DJ Russo, B Van Roy, A Kazerouni… - … and Trends® in …, 2018 - nowpublishers.com

Thompson sampling is an algorithm for online decision problems where actions are taken
sequentially in a manner that must balance between exploiting what is known to maximize …

被引用次数：1271 相关文章所有 34 个版本

[PDF] ieee.org

Taking the human out of the loop: A review of Bayesian optimization

B Shahriari, K Swersky, Z Wang… - Proceedings of the …, 2015 - ieeexplore.ieee.org

Big Data applications are typically associated with systems involving large numbers of
users, massive complex software systems, and large-scale heterogeneous computing and …

被引用次数：5897 相关文章所有 14 个版本

[PDF] tor-lattimore.com

[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

被引用次数：3226 相关文章所有 9 个版本

[PDF] acm.org

Bao: Making learned query optimization practical

R Marcus, P Negi, H Mao, N Tatbul… - Proceedings of the …, 2021 - dl.acm.org

Recent efforts applying machine learning techniques to query optimization have shown few
practical gains due to substantive training overhead, inability to adapt to changes, and poor …

被引用次数：245 相关文章所有 9 个版本

[PDF] nowpublishers.com

Introduction to multi-armed bandits

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com

Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

被引用次数：1220 相关文章所有 7 个版本

[PDF] ucl.ac.uk

Learning to reinforcement learn

JX Wang, Z Kurth-Nelson, D Tirumala, H Soyer… - arXiv preprint arXiv …, 2016 - arxiv.org

In recent years deep reinforcement learning (RL) systems have attained superhuman
performance in a number of challenging task domains. However, a major limitation of such …

被引用次数：1106 相关文章所有 8 个版本

[PDF] mlr.press

Weight uncertainty in neural network

C Blundell, J Cornebise… - … on machine learning, 2015 - proceedings.mlr.press

We introduce a new, efficient, principled and backpropagation-compatible algorithm for
learning a probability distribution on the weights of a neural network, called Bayes by …

被引用次数：4381 相关文章所有 7 个版本

[PDF] mlr.press

On kernelized multi-armed bandits

SR Chowdhury, A Gopalan - International Conference on …, 2017 - proceedings.mlr.press

We consider the stochastic bandit problem with a continuous set of arms, with the expected
reward function over the arms assumed to be fixed but unknown. We provide two new …

被引用次数：500 相关文章所有 8 个版本

[PDF] mlr.press

Non-stochastic best arm identification and hyperparameter optimization

K Jamieson, A Talwalkar - Artificial intelligence and statistics, 2016 - proceedings.mlr.press

Motivated by the task of hyperparameter optimization, we introduce the\em non-stochastic
best-arm identification problem. We identify an attractive algorithm for this setting that makes …

被引用次数：765 相关文章所有 8 个版本

[PDF] nowpublishers.com

Bayesian reinforcement learning: A survey

M Ghavamzadeh, S Mannor, J Pineau… - … and Trends® in …, 2015 - nowpublishers.com

Bayesian methods for machine learning have been widely investigated, yielding principled
methods for incorporating prior information into inference algorithms. In this survey, we …

被引用次数：583 相关文章所有 11 个版本