Taming the monster: A fast and simple algorithm for contextual bandits

DJ Foster, SM Kakade, J Qian, A Rakhlin - arXiv preprint arXiv:2112.13487, 2021 - arxiv.org

A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …

被引用次数：173 相关文章所有 6 个版本

[PDF] neurips.cc

Bellman eluder dimension: New rich classes of rl problems, and sample-efficient algorithms

C Jin, Q Liu, S Miryoosefi - Advances in neural information …, 2021 - proceedings.neurips.cc

Finding the minimal structural assumptions that empower sample-efficient learning is one of
the most important research directions in Reinforcement Learning (RL). This paper …

被引用次数：238 相关文章所有 11 个版本

[PDF] tor-lattimore.com

[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

被引用次数：2922 相关文章所有 9 个版本

[PDF] arxiv.org

Representation learning for online and offline rl in low-rank mdps

M Uehara, X Zhang, W Sun - arXiv preprint arXiv:2110.04652, 2021 - arxiv.org

This work studies the question of Representation Learning in RL: how can we learn a
compact low-dimensional representation such that on top of the representation we can …

被引用次数：132 相关文章所有 3 个版本

[PDF] arxiv.org

Top-k off-policy correction for a REINFORCE recommender system

M Chen, A Beutel, P Covington, S Jain… - Proceedings of the …, 2019 - dl.acm.org

Industrial recommender systems deal with extremely large action spaces--many millions of
items to recommend. Moreover, they need to serve billions of users, who are unique at any …

被引用次数：496 相关文章所有 10 个版本

[PDF] neurips.cc

Flambe: Structural complexity and representation learning of low rank mdps

A Agarwal, S Kakade… - Advances in neural …, 2020 - proceedings.neurips.cc

In order to deal with the curse of dimensionality in reinforcement learning (RL), it is common
practice to make parametric assumptions where values or policies are functions of some low …

被引用次数：263 相关文章所有 10 个版本

[PDF] nowpublishers.com

Introduction to multi-armed bandits

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com

Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

被引用次数：1073 相关文章所有 7 个版本

[PDF] datascienceassn.org

Concrete problems in AI safety

D Amodei, C Olah, J Steinhardt, P Christiano… - arXiv preprint arXiv …, 2016 - arxiv.org

Rapid progress in machine learning and artificial intelligence (AI) has brought increasing
attention to the potential impacts of AI technologies on society. In this paper we discuss one …

被引用次数：2795 相关文章所有 9 个版本

[PDF] abracadoudou.com

A study on overfitting in deep reinforcement learning

C Zhang, O Vinyals, R Munos, S Bengio - arXiv preprint arXiv:1804.06893, 2018 - arxiv.org

Recent years have witnessed significant progresses in deep Reinforcement Learning (RL).
Empowered with large scale neural networks, carefully designed architectures, novel …

被引用次数：471 相关文章所有 6 个版本

[PDF] mlr.press

Neural contextual bandits with ucb-based exploration

D Zhou, L Li, Q Gu - International Conference on Machine …, 2020 - proceedings.mlr.press

We study the stochastic contextual bandit problem, where the reward is generated from an
unknown function with additive noise. No assumption is made about the reward function …

被引用次数：256 相关文章所有 10 个版本