An information-theoretic analysis of thompson sampling

SCH Hoi, D Sahoo, J Lu, P Zhao - Neurocomputing, 2021 - Elsevier

Online learning represents a family of machine learning methods, where a learner attempts
to tackle some predictive (or any type of decision-making) task by learning from a sequence …

被引用次数：771 相关文章所有 6 个版本

[PDF] nowpublishers.com

A tutorial on thompson sampling

DJ Russo, B Van Roy, A Kazerouni… - … and Trends® in …, 2018 - nowpublishers.com

Thompson sampling is an algorithm for online decision problems where actions are taken
sequentially in a manner that must balance between exploiting what is known to maximize …

被引用次数：1189 相关文章所有 34 个版本

[PDF] mlr.press

Is pessimism provably efficient for offline rl?

Y Jin, Z Yang, Z Wang - International Conference on …, 2021 - proceedings.mlr.press

We study offline reinforcement learning (RL), which aims to learn an optimal policy based on
a dataset collected a priori. Due to the lack of further interactions with the environment …

被引用次数：405 相关文章所有 7 个版本

[PDF] tor-lattimore.com

[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

被引用次数：2981 相关文章所有 9 个版本

[PDF] arxiv.org

Derivative-free optimization methods

J Larson, M Menickelly, SM Wild - Acta Numerica, 2019 - cambridge.org

In many optimization problems arising from scientific, engineering and artificial intelligence
applications, objective and constraint functions are available only as the output of a black …

被引用次数：469 相关文章所有 9 个版本

[PDF] nowpublishers.com

Introduction to multi-armed bandits

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com

Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

被引用次数：1108 相关文章所有 7 个版本

[PDF] arxiv.org

Functional variational Bayesian neural networks

S Sun, G Zhang, J Shi, R Grosse - arXiv preprint arXiv:1903.05779, 2019 - arxiv.org

Variational Bayesian neural networks (BNNs) perform variational inference over weights, but
it is difficult to specify meaningful priors and approximate posteriors in a high-dimensional …

被引用次数：300 相关文章所有 6 个版本

[PDF] nowpublishers.com

Bayesian reinforcement learning: A survey

M Ghavamzadeh, S Mannor, J Pineau… - … and Trends® in …, 2015 - nowpublishers.com

Bayesian methods for machine learning have been widely investigated, yielding principled
methods for incorporating prior information into inference algorithms. In this survey, we …

被引用次数：558 相关文章所有 11 个版本

[PDF] mlr.press

Parallelised Bayesian optimisation via Thompson sampling

K Kandasamy, A Krishnamurthy… - International …, 2018 - proceedings.mlr.press

We design and analyse variations of the classical Thompson sampling (TS) procedure for
Bayesian optimisation (BO) in settings where function evaluations are expensive but can be …

被引用次数：281 相关文章所有 5 个版本

[PDF] mlr.press

On information gain and regret bounds in gaussian process bandits

S Vakili, K Khezeli, V Picheny - International Conference on …, 2021 - proceedings.mlr.press

Consider the sequential optimization of an expensive to evaluate and possibly non-convex
objective function $ f $ from noisy feedback, that can be considered as a continuum-armed …

被引用次数：121 相关文章所有 4 个版本