A tutorial on thompson sampling
Thompson sampling is an algorithm for online decision problems where actions are taken
sequentially in a manner that must balance between exploiting what is known to maximize …
sequentially in a manner that must balance between exploiting what is known to maximize …
Taking the human out of the loop: A review of Bayesian optimization
Big Data applications are typically associated with systems involving large numbers of
users, massive complex software systems, and large-scale heterogeneous computing and …
users, massive complex software systems, and large-scale heterogeneous computing and …
[图书][B] Bandit algorithms
T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …
and the multi-armed bandit model is a commonly used framework to address it. This …
Bao: Making learned query optimization practical
Recent efforts applying machine learning techniques to query optimization have shown few
practical gains due to substantive training overhead, inability to adapt to changes, and poor …
practical gains due to substantive training overhead, inability to adapt to changes, and poor …
Introduction to multi-armed bandits
A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …
decisions over time under uncertainty. An enormous body of work has accumulated over the …
Learning to reinforcement learn
In recent years deep reinforcement learning (RL) systems have attained superhuman
performance in a number of challenging task domains. However, a major limitation of such …
performance in a number of challenging task domains. However, a major limitation of such …
Weight uncertainty in neural network
C Blundell, J Cornebise… - … on machine learning, 2015 - proceedings.mlr.press
We introduce a new, efficient, principled and backpropagation-compatible algorithm for
learning a probability distribution on the weights of a neural network, called Bayes by …
learning a probability distribution on the weights of a neural network, called Bayes by …
On kernelized multi-armed bandits
SR Chowdhury, A Gopalan - International Conference on …, 2017 - proceedings.mlr.press
We consider the stochastic bandit problem with a continuous set of arms, with the expected
reward function over the arms assumed to be fixed but unknown. We provide two new …
reward function over the arms assumed to be fixed but unknown. We provide two new …
Non-stochastic best arm identification and hyperparameter optimization
K Jamieson, A Talwalkar - Artificial intelligence and statistics, 2016 - proceedings.mlr.press
Motivated by the task of hyperparameter optimization, we introduce the\em non-stochastic
best-arm identification problem. We identify an attractive algorithm for this setting that makes …
best-arm identification problem. We identify an attractive algorithm for this setting that makes …
Bayesian reinforcement learning: A survey
Bayesian methods for machine learning have been widely investigated, yielding principled
methods for incorporating prior information into inference algorithms. In this survey, we …
methods for incorporating prior information into inference algorithms. In this survey, we …