Deep reinforcement learning: a survey
Deep reinforcement learning (RL) has become one of the most popular topics in artificial
intelligence research. It has been widely used in various fields, such as end-to-end control …
intelligence research. It has been widely used in various fields, such as end-to-end control …
A tutorial on thompson sampling
Thompson sampling is an algorithm for online decision problems where actions are taken
sequentially in a manner that must balance between exploiting what is known to maximize …
sequentially in a manner that must balance between exploiting what is known to maximize …
Is pessimism provably efficient for offline rl?
We study offline reinforcement learning (RL), which aims to learn an optimal policy based on
a dataset collected a priori. Due to the lack of further interactions with the environment …
a dataset collected a priori. Due to the lack of further interactions with the environment …
The statistical complexity of interactive decision making
A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …
[图书][B] Bandit algorithms
T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …
and the multi-armed bandit model is a commonly used framework to address it. This …
Learning to reinforcement learn
In recent years deep reinforcement learning (RL) systems have attained superhuman
performance in a number of challenging task domains. However, a major limitation of such …
performance in a number of challenging task domains. However, a major limitation of such …
Deep bayesian bandits showdown: An empirical comparison of bayesian deep networks for thompson sampling
Recent advances in deep reinforcement learning have made significant strides in
performance on applications such as Go and Atari games. However, developing practical …
performance on applications such as Go and Atari games. However, developing practical …
Epistemic neural networks
Intelligence relies on an agent's knowledge of what it does not know. This capability can be
assessed based on the quality of joint predictions of labels across multiple inputs. In …
assessed based on the quality of joint predictions of labels across multiple inputs. In …
Online decision making with high-dimensional covariates
Big data have enabled decision makers to tailor decisions at the individual level in a variety
of domains, such as personalized medicine and online advertising. Doing so involves …
of domains, such as personalized medicine and online advertising. Doing so involves …
Deep exploration via randomized value functions
We study the use of randomized value functions to guide deep exploration in reinforcement
learning. This offers an elegant means for synthesizing statistically and computationally …
learning. This offers an elegant means for synthesizing statistically and computationally …