Pc-pg: Policy cover directed exploration for provable policy gradient learning
Direct policy gradient methods for reinforcement learning are a successful approach for a
variety of reasons: they are model free, they directly optimize the performance metric of …
variety of reasons: they are model free, they directly optimize the performance metric of …
Confident Approximate Policy Iteration for Efficient Local Planning in -realizable MDPs
We consider approximate dynamic programming in $\gamma $-discounted Markov decision
processes and apply it to approximate planning with linear value-function approximation …
processes and apply it to approximate planning with linear value-function approximation …
Multi-timescale ensemble Q-learning for Markov decision process policy optimization
Reinforcement learning (RL) is a classical tool to solve network control or policy optimization
problems in unknown environments. The original-learning suffers from performance and …
problems in unknown environments. The original-learning suffers from performance and …
On the sample complexity and metastability of heavy-tailed policy search in continuous control
Reinforcement learning is a framework for interactive decision-making with incentives
sequentially revealed across time without a system dynamics model. Due to its scaling to …
sequentially revealed across time without a system dynamics model. Due to its scaling to …
Optimizing audio recommendations for the long-term: A reinforcement learning perspective
We study the problem of optimizing a recommender system for outcomes that occur over
several weeks or months. We begin by drawing on reinforcement learning to formulate a …
several weeks or months. We begin by drawing on reinforcement learning to formulate a …
Learning the minimal representation of a dynamic system from transition data
This paper proposes a framework for learning the most concise MDP model of a continuous
state space dynamic system from observed transition data. This setting is encountered in …
state space dynamic system from observed transition data. This setting is encountered in …
Functional Acceleration for Policy Mirror Descent
We apply functional acceleration to the Policy Mirror Descent (PMD) general family of
algorithms, which cover a wide range of novel and fundamental methods in Reinforcement …
algorithms, which cover a wide range of novel and fundamental methods in Reinforcement …
Randomized value functions via posterior state-abstraction sampling
D Arumugam, B Van Roy - arXiv preprint arXiv:2010.02383, 2020 - arxiv.org
State abstraction has been an essential tool for dramatically improving the sample efficiency
of reinforcement-learning algorithms. Indeed, by exposing and accentuating various types of …
of reinforcement-learning algorithms. Indeed, by exposing and accentuating various types of …
[PDF][PDF] Towards applicable state abstractions: a preview in strategy games
State Abstraction is a methodology that aims to simplify planning problems and enable
planners to deal with more complex environments. It is a useful tool that helps Artificial …
planners to deal with more complex environments. It is a useful tool that helps Artificial …
The Complexity of Reinforcement Learning with Linear Function Approximation
G Weisz - 2024 - discovery.ucl.ac.uk
In this thesis we present contributions to the theoretical foundations of large-scale
reinforcement learning (RL) with linear function approximation, with a focus on establishing …
reinforcement learning (RL) with linear function approximation, with a focus on establishing …