A definition of continual reinforcement learning
In a standard view of the reinforcement learning problem, an agent's goal is to efficiently
identify a policy that maximizes long-term reward. However, this perspective is based on a …
identify a policy that maximizes long-term reward. However, this perspective is based on a …
Settling the reward hypothesis
The reward hypothesis posits that," all of what we mean by goals and purposes can be well
thought of as maximization of the expected value of the cumulative sum of a received scalar …
thought of as maximization of the expected value of the cumulative sum of a received scalar …
An invitation to deep reinforcement learning
Training a deep neural network to maximize a target objective has become the standard
recipe for successful machine learning over the last decade. These networks can be …
recipe for successful machine learning over the last decade. These networks can be …
Approximate thompson sampling via epistemic neural networks
Thompson sampling (TS) is a popular heuristic for action selection, but it requires sampling
from a posterior distribution. Unfortunately, this can become computationally intractable in …
from a posterior distribution. Unfortunately, this can become computationally intractable in …
Continual learning as computationally constrained reinforcement learning
An agent that efficiently accumulates knowledge to develop increasingly sophisticated skills
over a long lifetime could advance the frontier of artificial intelligence capabilities. The …
over a long lifetime could advance the frontier of artificial intelligence capabilities. The …
Multiagent reinforcement learning-based adaptive sampling for conformational dynamics of proteins
DE Kleiman, D Shukla - Journal of Chemical Theory and …, 2022 - ACS Publications
Machine learning is increasingly applied to improve the efficiency and accuracy of molecular
dynamics (MD) simulations. Although the growth of distributed computer clusters has …
dynamics (MD) simulations. Although the growth of distributed computer clusters has …
Regret bounds for information-directed reinforcement learning
B Hao, T Lattimore - Advances in neural information …, 2022 - proceedings.neurips.cc
Abstract Information-directed sampling (IDS) has revealed its potential as a data-efficient
algorithm for reinforcement learning (RL). However, theoretical understanding of IDS for …
algorithm for reinforcement learning (RL). However, theoretical understanding of IDS for …
Learning and information in stochastic networks and queues
We review the role of information and learning in the stability and optimization of queueing
systems. In recent years, techniques from supervised learning, online learning, and …
systems. In recent years, techniques from supervised learning, online learning, and …
Contextual information-directed sampling
Abstract Information-directed sampling (IDS) has recently demonstrated its potential as a
data-efficient reinforcement learning algorithm. However, it is still unclear what is the right …
data-efficient reinforcement learning algorithm. However, it is still unclear what is the right …
Simple agent, complex environment: Efficient reinforcement learning with agent states
We design a simple reinforcement learning (RL) agent that implements an optimistic version
of Q-learning and establish through regret analysis that this agent can operate with some …
of Q-learning and establish through regret analysis that this agent can operate with some …