Pc-pg: Policy cover directed exploration for provable policy gradient learning

A Agarwal, M Henaff, S Kakade… - Advances in neural …, 2020 - proceedings.neurips.cc
Direct policy gradient methods for reinforcement learning are a successful approach for a
variety of reasons: they are model free, they directly optimize the performance metric of …

Confident Approximate Policy Iteration for Efficient Local Planning in -realizable MDPs

G Weisz, A György, T Kozuno… - Advances in Neural …, 2022 - proceedings.neurips.cc
We consider approximate dynamic programming in $\gamma $-discounted Markov decision
processes and apply it to approximate planning with linear value-function approximation …

Multi-timescale ensemble Q-learning for Markov decision process policy optimization

T Bozkus, U Mitra - IEEE Transactions on Signal Processing, 2024 - ieeexplore.ieee.org
Reinforcement learning (RL) is a classical tool to solve network control or policy optimization
problems in unknown environments. The original-learning suffers from performance and …

On the sample complexity and metastability of heavy-tailed policy search in continuous control

AS Bedi, A Parayil, J Zhang, M Wang… - Journal of Machine …, 2024 - jmlr.org
Reinforcement learning is a framework for interactive decision-making with incentives
sequentially revealed across time without a system dynamics model. Due to its scaling to …

Optimizing audio recommendations for the long-term: A reinforcement learning perspective

L Maystre, D Russo, Y Zhao - arXiv preprint arXiv:2302.03561, 2023 - arxiv.org
We study the problem of optimizing a recommender system for outcomes that occur over
several weeks or months. We begin by drawing on reinforcement learning to formulate a …

Learning the minimal representation of a dynamic system from transition data

MA Bennouna, D Pachamanova, G Perakis… - Available at SSRN …, 2021 - papers.ssrn.com
This paper proposes a framework for learning the most concise MDP model of a continuous
state space dynamic system from observed transition data. This setting is encountered in …

Functional Acceleration for Policy Mirror Descent

V Chelu, D Precup - arXiv preprint arXiv:2407.16602, 2024 - arxiv.org
We apply functional acceleration to the Policy Mirror Descent (PMD) general family of
algorithms, which cover a wide range of novel and fundamental methods in Reinforcement …

Randomized value functions via posterior state-abstraction sampling

D Arumugam, B Van Roy - arXiv preprint arXiv:2010.02383, 2020 - arxiv.org
State abstraction has been an essential tool for dramatically improving the sample efficiency
of reinforcement-learning algorithms. Indeed, by exposing and accentuating various types of …

[PDF][PDF] Towards applicable state abstractions: a preview in strategy games

L Xu, D Perez-Liebana, A Dockhorn - The Multi-disciplinary …, 2022 - diego-perez.net
State Abstraction is a methodology that aims to simplify planning problems and enable
planners to deal with more complex environments. It is a useful tool that helps Artificial …

The Complexity of Reinforcement Learning with Linear Function Approximation

G Weisz - 2024 - discovery.ucl.ac.uk
In this thesis we present contributions to the theoretical foundations of large-scale
reinforcement learning (RL) with linear function approximation, with a focus on establishing …