- 学术资源搜索

Deep reinforcement learning: a survey

H Wang, N Liu, Y Zhang, D Feng, F Huang, D Li… - Frontiers of Information …, 2020 - Springer

Deep reinforcement learning (RL) has become one of the most popular topics in artificial
intelligence research. It has been widely used in various fields, such as end-to-end control …

被引用次数：264 相关文章所有 11 个版本

[PDF] nowpublishers.com

A tutorial on thompson sampling

DJ Russo, B Van Roy, A Kazerouni… - … and Trends® in …, 2018 - nowpublishers.com

Thompson sampling is an algorithm for online decision problems where actions are taken
sequentially in a manner that must balance between exploiting what is known to maximize …

被引用次数：1272 相关文章所有 34 个版本

[PDF] mlr.press

Is pessimism provably efficient for offline rl?

Y Jin, Z Yang, Z Wang - International Conference on …, 2021 - proceedings.mlr.press

We study offline reinforcement learning (RL), which aims to learn an optimal policy based on
a dataset collected a priori. Due to the lack of further interactions with the environment …

被引用次数：446 相关文章所有 7 个版本

[PDF] arxiv.org

The statistical complexity of interactive decision making

DJ Foster, SM Kakade, J Qian, A Rakhlin - arXiv preprint arXiv:2112.13487, 2021 - arxiv.org

A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …

被引用次数：203 相关文章所有 6 个版本

[PDF] tor-lattimore.com

[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

被引用次数：3240 相关文章所有 9 个版本

[PDF] ucl.ac.uk

Learning to reinforcement learn

JX Wang, Z Kurth-Nelson, D Tirumala, H Soyer… - arXiv preprint arXiv …, 2016 - arxiv.org

In recent years deep reinforcement learning (RL) systems have attained superhuman
performance in a number of challenging task domains. However, a major limitation of such …

被引用次数：1107 相关文章所有 8 个版本

[PDF] arxiv.org

Deep bayesian bandits showdown: An empirical comparison of bayesian deep networks for thompson sampling

C Riquelme, G Tucker, J Snoek - arXiv preprint arXiv:1802.09127, 2018 - arxiv.org

Recent advances in deep reinforcement learning have made significant strides in
performance on applications such as Go and Atari games. However, developing practical …

被引用次数：408 相关文章所有 5 个版本

[PDF] neurips.cc

Epistemic neural networks

I Osband, Z Wen, SM Asghari… - Advances in …, 2023 - proceedings.neurips.cc

Intelligence relies on an agent's knowledge of what it does not know. This capability can be
assessed based on the quality of joint predictions of labels across multiple inputs. In …

被引用次数：121 相关文章所有 6 个版本

[PDF] ssrn.com

Online decision making with high-dimensional covariates

H Bastani, M Bayati - Operations Research, 2020 - pubsonline.informs.org

Big data have enabled decision makers to tailor decisions at the individual level in a variety
of domains, such as personalized medicine and online advertising. Doing so involves …

被引用次数：618 相关文章所有 12 个版本

[PDF] jmlr.org

Deep exploration via randomized value functions

I Osband, B Van Roy, DJ Russo, Z Wen - Journal of Machine Learning …, 2019 - jmlr.org

We study the use of randomized value functions to guide deep exploration in reinforcement
learning. This offers an elegant means for synthesizing statistically and computationally …

被引用次数：359 相关文章所有 9 个版本