A tutorial on thompson sampling

DJ Russo, B Van Roy, A Kazerouni… - … and Trends® in …, 2018 - nowpublishers.com
Thompson sampling is an algorithm for online decision problems where actions are taken
sequentially in a manner that must balance between exploiting what is known to maximize …

Taking the human out of the loop: A review of Bayesian optimization

B Shahriari, K Swersky, Z Wang… - Proceedings of the …, 2015 - ieeexplore.ieee.org
Big Data applications are typically associated with systems involving large numbers of
users, massive complex software systems, and large-scale heterogeneous computing and …

[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

Bao: Making learned query optimization practical

R Marcus, P Negi, H Mao, N Tatbul… - Proceedings of the …, 2021 - dl.acm.org
Recent efforts applying machine learning techniques to query optimization have shown few
practical gains due to substantive training overhead, inability to adapt to changes, and poor …

Introduction to multi-armed bandits

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

Learning to reinforcement learn

JX Wang, Z Kurth-Nelson, D Tirumala, H Soyer… - arXiv preprint arXiv …, 2016 - arxiv.org
In recent years deep reinforcement learning (RL) systems have attained superhuman
performance in a number of challenging task domains. However, a major limitation of such …

Weight uncertainty in neural network

C Blundell, J Cornebise… - … on machine learning, 2015 - proceedings.mlr.press
We introduce a new, efficient, principled and backpropagation-compatible algorithm for
learning a probability distribution on the weights of a neural network, called Bayes by …

On kernelized multi-armed bandits

SR Chowdhury, A Gopalan - International Conference on …, 2017 - proceedings.mlr.press
We consider the stochastic bandit problem with a continuous set of arms, with the expected
reward function over the arms assumed to be fixed but unknown. We provide two new …

Non-stochastic best arm identification and hyperparameter optimization

K Jamieson, A Talwalkar - Artificial intelligence and statistics, 2016 - proceedings.mlr.press
Motivated by the task of hyperparameter optimization, we introduce the\em non-stochastic
best-arm identification problem. We identify an attractive algorithm for this setting that makes …

Bayesian reinforcement learning: A survey

M Ghavamzadeh, S Mannor, J Pineau… - … and Trends® in …, 2015 - nowpublishers.com
Bayesian methods for machine learning have been widely investigated, yielding principled
methods for incorporating prior information into inference algorithms. In this survey, we …