Learning to optimize via posterior sampling

DJ Russo, B Van Roy, A Kazerouni… - … and Trends® in …, 2018 - nowpublishers.com

Thompson sampling is an algorithm for online decision problems where actions are taken
sequentially in a manner that must balance between exploiting what is known to maximize …

被引用次数：1215 相关文章所有 34 个版本

[PDF] ieee.org

Taking the human out of the loop: A review of Bayesian optimization

B Shahriari, K Swersky, Z Wang… - Proceedings of the …, 2015 - ieeexplore.ieee.org

Big Data applications are typically associated with systems involving large numbers of
users, massive complex software systems, and large-scale heterogeneous computing and …

被引用次数：5642 相关文章所有 14 个版本

[PDF] neurips.cc

Supervised pretraining can learn in-context reinforcement learning

J Lee, A Xie, A Pacchiano, Y Chandak… - Advances in …, 2024 - proceedings.neurips.cc

Large transformer models trained on diverse datasets have shown a remarkable ability to
learn in-context, achieving high few-shot performance on tasks they were not explicitly …

被引用次数：42 相关文章所有 7 个版本

[PDF] arxiv.org

The statistical complexity of interactive decision making

DJ Foster, SM Kakade, J Qian, A Rakhlin - arXiv preprint arXiv:2112.13487, 2021 - arxiv.org

A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …

被引用次数：190 相关文章所有 6 个版本

[PDF] mlr.press

Poem: Out-of-distribution detection with posterior sampling

Y Ming, Y Fan, Y Li - International Conference on Machine …, 2022 - proceedings.mlr.press

Abstract Out-of-distribution (OOD) detection is indispensable for machine learning models
deployed in the open world. Recently, the use of an auxiliary outlier dataset during training …

被引用次数：95 相关文章所有 4 个版本

[PDF] mlr.press

Bilinear classes: A structural framework for provable generalization in rl

S Du, S Kakade, J Lee, S Lovett… - International …, 2021 - proceedings.mlr.press

Abstract This work introduces Bilinear Classes, a new structural framework, which permit
generalization in reinforcement learning in a wide variety of settings through the use of …

被引用次数：233 相关文章所有 8 个版本

[PDF] arxiv.org

Neural thompson sampling

W Zhang, D Zhou, L Li, Q Gu - arXiv preprint arXiv:2010.00827, 2020 - arxiv.org

Thompson Sampling (TS) is one of the most effective algorithms for solving contextual multi-
armed bandit problems. In this paper, we propose a new algorithm, called Neural Thompson …

被引用次数：256 相关文章所有 8 个版本

[PDF] tor-lattimore.com

[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

被引用次数：3074 相关文章所有 9 个版本

[PDF] mlr.press

Model-based reinforcement learning with value-targeted regression

A Ayoub, Z Jia, C Szepesvari… - … on Machine Learning, 2020 - proceedings.mlr.press

This paper studies model-based reinforcement learning (RL) for regret minimization. We
focus on finite-horizon episodic RL where the transition model $ P $ belongs to a known …

被引用次数：332 相关文章所有 8 个版本

[PDF] arxiv.org

Representation learning for online and offline rl in low-rank mdps

M Uehara, X Zhang, W Sun - arXiv preprint arXiv:2110.04652, 2021 - arxiv.org

This work studies the question of Representation Learning in RL: how can we learn a
compact low-dimensional representation such that on top of the representation we can …

被引用次数：149 相关文章所有 3 个版本