Reinforcement learning, bit by bit

D Abel, A Barreto, B Van Roy… - Advances in …, 2024 - proceedings.neurips.cc

In a standard view of the reinforcement learning problem, an agent's goal is to efficiently
identify a policy that maximizes long-term reward. However, this perspective is based on a …

被引用次数：64 相关文章所有 8 个版本

[PDF] mlr.press

Settling the reward hypothesis

M Bowling, JD Martin, D Abel… - … on Machine Learning, 2023 - proceedings.mlr.press

The reward hypothesis posits that," all of what we mean by goals and purposes can be well
thought of as maximization of the expected value of the cumulative sum of a received scalar …

被引用次数：38 相关文章所有 8 个版本

[PDF] nowpublishers.com

An invitation to deep reinforcement learning

B Jaeger, A Geiger - Foundations and Trends® in …, 2024 - nowpublishers.com

Training a deep neural network to maximize a target objective has become the standard
recipe for successful machine learning over the last decade. These networks can be …

被引用次数：6 相关文章所有 5 个版本

[PDF] mlr.press

Approximate thompson sampling via epistemic neural networks

I Osband, Z Wen, SM Asghari… - Uncertainty in …, 2023 - proceedings.mlr.press

Thompson sampling (TS) is a popular heuristic for action selection, but it requires sampling
from a posterior distribution. Unfortunately, this can become computationally intractable in …

被引用次数：24 相关文章所有 7 个版本

[PDF] arxiv.org

Continual learning as computationally constrained reinforcement learning

S Kumar, H Marklund, A Rao, Y Zhu, HJ Jeon… - arXiv preprint arXiv …, 2023 - arxiv.org

An agent that efficiently accumulates knowledge to develop increasingly sophisticated skills
over a long lifetime could advance the frontier of artificial intelligence capabilities. The …

被引用次数：18 相关文章所有 2 个版本

[PDF] nsf.gov

Multiagent reinforcement learning-based adaptive sampling for conformational dynamics of proteins

DE Kleiman, D Shukla - Journal of Chemical Theory and …, 2022 - ACS Publications

Machine learning is increasingly applied to improve the efficiency and accuracy of molecular
dynamics (MD) simulations. Although the growth of distributed computer clusters has …

被引用次数：25 相关文章所有 5 个版本

[PDF] neurips.cc

Regret bounds for information-directed reinforcement learning

B Hao, T Lattimore - Advances in neural information …, 2022 - proceedings.neurips.cc

Abstract Information-directed sampling (IDS) has revealed its potential as a data-efficient
algorithm for reinforcement learning (RL). However, theoretical understanding of IDS for …

被引用次数：23 相关文章所有 8 个版本

[PDF] arxiv.org

Learning and information in stochastic networks and queues

N Walton, K Xu - Tutorials in Operations Research …, 2021 - pubsonline.informs.org

We review the role of information and learning in the stability and optimization of queueing
systems. In recent years, techniques from supervised learning, online learning, and …

被引用次数：34 相关文章所有 3 个版本

[PDF] mlr.press

Contextual information-directed sampling

B Hao, T Lattimore, C Qin - International Conference on …, 2022 - proceedings.mlr.press

Abstract Information-directed sampling (IDS) has recently demonstrated its potential as a
data-efficient reinforcement learning algorithm. However, it is still unclear what is the right …

被引用次数：17 相关文章所有 4 个版本

[PDF] jmlr.org

Simple agent, complex environment: Efficient reinforcement learning with agent states

S Dong, B Van Roy, Z Zhou - Journal of Machine Learning Research, 2022 - jmlr.org

We design a simple reinforcement learning (RL) agent that implements an optimistic version
of Q-learning and establish through regret analysis that this agent can operate with some …

被引用次数：47 相关文章所有 5 个版本