Approximation benefits of policy gradient methods with aggregated states

A Agarwal, M Henaff, S Kakade… - Advances in neural …, 2020 - proceedings.neurips.cc

Direct policy gradient methods for reinforcement learning are a successful approach for a
variety of reasons: they are model free, they directly optimize the performance metric of …

被引用次数：135 相关文章所有 11 个版本

[PDF] neurips.cc

Confident Approximate Policy Iteration for Efficient Local Planning in -realizable MDPs

G Weisz, A György, T Kozuno… - Advances in Neural …, 2022 - proceedings.neurips.cc

We consider approximate dynamic programming in $\gamma $-discounted Markov decision
processes and apply it to approximate planning with linear value-function approximation …

被引用次数：10 相关文章所有 8 个版本

[PDF] arxiv.org

Multi-timescale ensemble Q-learning for Markov decision process policy optimization

T Bozkus, U Mitra - IEEE Transactions on Signal Processing, 2024 - ieeexplore.ieee.org

Reinforcement learning (RL) is a classical tool to solve network control or policy optimization
problems in unknown environments. The original-learning suffers from performance and …

被引用次数：1 相关文章所有 5 个版本

[PDF] jmlr.org

On the sample complexity and metastability of heavy-tailed policy search in continuous control

AS Bedi, A Parayil, J Zhang, M Wang… - Journal of Machine …, 2024 - jmlr.org

Reinforcement learning is a framework for interactive decision-making with incentives
sequentially revealed across time without a system dynamics model. Due to its scaling to …

被引用次数：13 相关文章所有 3 个版本

[PDF] arxiv.org

Optimizing audio recommendations for the long-term: A reinforcement learning perspective

L Maystre, D Russo, Y Zhao - arXiv preprint arXiv:2302.03561, 2023 - arxiv.org

We study the problem of optimizing a recommender system for outcomes that occur over
several weeks or months. We begin by drawing on reinforcement learning to formulate a …

被引用次数：6 相关文章所有 2 个版本

Learning the minimal representation of a dynamic system from transition data

MA Bennouna, D Pachamanova, G Perakis… - Available at SSRN …, 2021 - papers.ssrn.com

This paper proposes a framework for learning the most concise MDP model of a continuous
state space dynamic system from observed transition data. This setting is encountered in …

被引用次数：9 相关文章

[PDF] arxiv.org

Functional Acceleration for Policy Mirror Descent

V Chelu, D Precup - arXiv preprint arXiv:2407.16602, 2024 - arxiv.org

We apply functional acceleration to the Policy Mirror Descent (PMD) general family of
algorithms, which cover a wide range of novel and fundamental methods in Reinforcement …

[PDF] arxiv.org

Randomized value functions via posterior state-abstraction sampling

D Arumugam, B Van Roy - arXiv preprint arXiv:2010.02383, 2020 - arxiv.org

State abstraction has been an essential tool for dramatically improving the sample efficiency
of reinforcement-learning algorithms. Indeed, by exposing and accentuating various types of …

被引用次数：7 相关文章所有 3 个版本

[PDF] diego-perez.net

[PDF][PDF] Towards applicable state abstractions: a preview in strategy games

L Xu, D Perez-Liebana, A Dockhorn - The Multi-disciplinary …, 2022 - diego-perez.net

State Abstraction is a methodology that aims to simplify planning problems and enable
planners to deal with more complex environments. It is a useful tool that helps Artificial …

被引用次数：1 相关文章所有 4 个版本

[PDF] ucl.ac.uk

The Complexity of Reinforcement Learning with Linear Function Approximation

G Weisz - 2024 - discovery.ucl.ac.uk

In this thesis we present contributions to the theoretical foundations of large-scale
reinforcement learning (RL) with linear function approximation, with a focus on establishing …