Deep reinforcement learning: a survey

H Wang, N Liu, Y Zhang, D Feng, F Huang, D Li… - Frontiers of Information …, 2020 - Springer
Deep reinforcement learning (RL) has become one of the most popular topics in artificial
intelligence research. It has been widely used in various fields, such as end-to-end control …

A tutorial on thompson sampling

DJ Russo, B Van Roy, A Kazerouni… - … and Trends® in …, 2018 - nowpublishers.com
Thompson sampling is an algorithm for online decision problems where actions are taken
sequentially in a manner that must balance between exploiting what is known to maximize …

Is pessimism provably efficient for offline rl?

Y Jin, Z Yang, Z Wang - International Conference on …, 2021 - proceedings.mlr.press
We study offline reinforcement learning (RL), which aims to learn an optimal policy based on
a dataset collected a priori. Due to the lack of further interactions with the environment …

The statistical complexity of interactive decision making

DJ Foster, SM Kakade, J Qian, A Rakhlin - arXiv preprint arXiv:2112.13487, 2021 - arxiv.org
A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …

[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

Learning to reinforcement learn

JX Wang, Z Kurth-Nelson, D Tirumala, H Soyer… - arXiv preprint arXiv …, 2016 - arxiv.org
In recent years deep reinforcement learning (RL) systems have attained superhuman
performance in a number of challenging task domains. However, a major limitation of such …

Deep bayesian bandits showdown: An empirical comparison of bayesian deep networks for thompson sampling

C Riquelme, G Tucker, J Snoek - arXiv preprint arXiv:1802.09127, 2018 - arxiv.org
Recent advances in deep reinforcement learning have made significant strides in
performance on applications such as Go and Atari games. However, developing practical …

Epistemic neural networks

I Osband, Z Wen, SM Asghari… - Advances in …, 2023 - proceedings.neurips.cc
Intelligence relies on an agent's knowledge of what it does not know. This capability can be
assessed based on the quality of joint predictions of labels across multiple inputs. In …

Online decision making with high-dimensional covariates

H Bastani, M Bayati - Operations Research, 2020 - pubsonline.informs.org
Big data have enabled decision makers to tailor decisions at the individual level in a variety
of domains, such as personalized medicine and online advertising. Doing so involves …

Deep exploration via randomized value functions

I Osband, B Van Roy, DJ Russo, Z Wen - Journal of Machine Learning …, 2019 - jmlr.org
We study the use of randomized value functions to guide deep exploration in reinforcement
learning. This offers an elegant means for synthesizing statistically and computationally …