A tutorial on thompson sampling
Thompson sampling is an algorithm for online decision problems where actions are taken
sequentially in a manner that must balance between exploiting what is known to maximize …
sequentially in a manner that must balance between exploiting what is known to maximize …
Taking the human out of the loop: A review of Bayesian optimization
Big Data applications are typically associated with systems involving large numbers of
users, massive complex software systems, and large-scale heterogeneous computing and …
users, massive complex software systems, and large-scale heterogeneous computing and …
Supervised pretraining can learn in-context reinforcement learning
Large transformer models trained on diverse datasets have shown a remarkable ability to
learn in-context, achieving high few-shot performance on tasks they were not explicitly …
learn in-context, achieving high few-shot performance on tasks they were not explicitly …
The statistical complexity of interactive decision making
A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …
Poem: Out-of-distribution detection with posterior sampling
Abstract Out-of-distribution (OOD) detection is indispensable for machine learning models
deployed in the open world. Recently, the use of an auxiliary outlier dataset during training …
deployed in the open world. Recently, the use of an auxiliary outlier dataset during training …
Bilinear classes: A structural framework for provable generalization in rl
Abstract This work introduces Bilinear Classes, a new structural framework, which permit
generalization in reinforcement learning in a wide variety of settings through the use of …
generalization in reinforcement learning in a wide variety of settings through the use of …
Neural thompson sampling
Thompson Sampling (TS) is one of the most effective algorithms for solving contextual multi-
armed bandit problems. In this paper, we propose a new algorithm, called Neural Thompson …
armed bandit problems. In this paper, we propose a new algorithm, called Neural Thompson …
[图书][B] Bandit algorithms
T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …
and the multi-armed bandit model is a commonly used framework to address it. This …
Model-based reinforcement learning with value-targeted regression
This paper studies model-based reinforcement learning (RL) for regret minimization. We
focus on finite-horizon episodic RL where the transition model $ P $ belongs to a known …
focus on finite-horizon episodic RL where the transition model $ P $ belongs to a known …
Representation learning for online and offline rl in low-rank mdps
This work studies the question of Representation Learning in RL: how can we learn a
compact low-dimensional representation such that on top of the representation we can …
compact low-dimensional representation such that on top of the representation we can …