Why is posterior sampling better than optimism for reinforcement learning?

P Ladosz, L Weng, M Kim, H Oh - Information Fusion, 2022 - Elsevier

This paper reviews exploration techniques in deep reinforcement learning. Exploration
techniques are of primary importance when solving sparse reward problems. In sparse …

被引用次数：352 相关文章所有 5 个版本

[PDF] arxiv.org

Offline reinforcement learning: Tutorial, review, and perspectives on open problems

S Levine, A Kumar, G Tucker, J Fu - arXiv preprint arXiv:2005.01643, 2020 - arxiv.org

In this tutorial article, we aim to provide the reader with the conceptual tools needed to get
started on research on offline reinforcement learning algorithms: reinforcement learning …

被引用次数：2104 相关文章所有 3 个版本

[PDF] neurips.cc

Combo: Conservative offline model-based policy optimization

T Yu, A Kumar, R Rafailov… - Advances in neural …, 2021 - proceedings.neurips.cc

Abstract Model-based reinforcement learning (RL) algorithms, which learn a dynamics
model from logged experience and perform conservative planning under the learned model …

被引用次数：449 相关文章所有 7 个版本

[PDF] neurips.cc

Conservative q-learning for offline reinforcement learning

A Kumar, A Zhou, G Tucker… - Advances in Neural …, 2020 - proceedings.neurips.cc

Effectively leveraging large, previously collected datasets in reinforcement learn-ing (RL) is
a key challenge for large-scale real-world applications. Offline RL algorithms promise to …

被引用次数：2004 相关文章所有 10 个版本

[PDF] arxiv.org

The statistical complexity of interactive decision making

DJ Foster, SM Kakade, J Qian, A Rakhlin - arXiv preprint arXiv:2112.13487, 2021 - arxiv.org

A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …

被引用次数：203 相关文章所有 6 个版本

[PDF] mlr.press

Poem: Out-of-distribution detection with posterior sampling

Y Ming, Y Fan, Y Li - International Conference on Machine …, 2022 - proceedings.mlr.press

Abstract Out-of-distribution (OOD) detection is indispensable for machine learning models
deployed in the open world. Recently, the use of an auxiliary outlier dataset during training …

被引用次数：113 相关文章所有 4 个版本

[PDF] nowpublishers.com

Model-based reinforcement learning: A survey

TM Moerland, J Broekens, A Plaat… - … and Trends® in …, 2023 - nowpublishers.com

Sequential decision making, commonly formalized as Markov Decision Process (MDP)
optimization, is an important challenge in artificial intelligence. Two key approaches to this …

被引用次数：900 相关文章所有 17 个版本

[PDF] tor-lattimore.com

[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

被引用次数：3240 相关文章所有 9 个版本

[PDF] nowpublishers.com

Neural approaches to conversational AI

J Gao, M Galley, L Li - The 41st international ACM SIGIR conference on …, 2018 - dl.acm.org

This tutorial surveys neural approaches to conversational AI that were developed in the last
few years. We group conversational systems into three categories:(1) question answering …

被引用次数：915 相关文章所有 16 个版本

[PDF] arxiv.org

A simple neural attentive meta-learner

N Mishra, M Rohaninejad, X Chen, P Abbeel - arXiv preprint arXiv …, 2017 - arxiv.org

Deep neural networks excel in regimes with large amounts of data, but tend to struggle
when data is scarce or when they need to adapt quickly to changes in the task. In response …

被引用次数：1604 相关文章所有 8 个版本