Improving policies without measuring merits

K Doya - Neural computation, 2000 - direct.mit.edu

This article presents a reinforcement learning framework for continuous-time dynamical
systems without a priori discretization of time, state, and action. Basedonthe Hamilton-Jacobi …

被引用次数：1324 相关文章所有 19 个版本

[PDF] cmu.edu

[图书][B] Approximate solutions to Markov decision processes

GJ Gordon - 1999 - search.proquest.com

One of the basic problems of machine learning is deciding how to act in an uncertain world.
For example, if I want my robot to bring me a cup of coffee, it must be able to compute the …

被引用次数：224 相关文章所有 13 个版本

[PDF] mlr.press

Control frequency adaptation via action persistence in batch reinforcement learning

AM Metelli, F Mazzolini, L Bisi… - International …, 2020 - proceedings.mlr.press

The choice of the control frequency of a system has a relevant impact on the ability of
reinforcement learning algorithms to learn a highly performing policy. In this paper, we …

被引用次数：52 相关文章所有 8 个版本

[PDF] jmlr.org

Hamilton-Jacobi deep Q-Learning for deterministic continuous-time systems with lipschitz continuous controls

J Kim, J Shin, I Yang - Journal of Machine Learning Research, 2021 - jmlr.org

In this paper, we propose Q-learning algorithms for continuous-time deterministic optimal
control problems with Lipschitz continuous controls. A new class of Hamilton-Jacobi …

被引用次数：36 相关文章所有 7 个版本

[PDF] arxiv.org

Value-gradient iteration with quadratic approximate value functions

A Yang, S Boyd - Annual Reviews in Control, 2023 - Elsevier

We propose a method for designing policies for convex stochastic control problems
characterized by random linear dynamics and convex stage cost. We consider policies that …

[PDF][PDF] Utilizing the natural gradient in temporal difference reinforcement learning with eligibility traces

T Morimura, E Uchibe, K Doya - International Symposium on …, 2005 - researchgate.net

The policy gradient method is a promising approach in reinforcement learning (RL) for
optimizing action policy parameters in order to maximize average reward. The natural …

被引用次数：43 相关文章所有 5 个版本

[PDF] arxiv.org

Reinforcement learning by value gradients

M Fairbank - arXiv preprint arXiv:0803.3539, 2008 - arxiv.org

The concept of the value-gradient is introduced and developed in the context of
reinforcement learning. It is shown that by learning the value-gradients exploration or …

被引用次数：26 相关文章所有 3 个版本

[PDF] polimi.it

[图书][B] Exploiting environment configurability in reinforcement learning

AM Metelli - 2022 - books.google.com

In recent decades, Reinforcement Learning (RL) has emerged as an effective approach to
address complex control tasks. In a Markov Decision Process (MDP), the framework typically …

被引用次数：7 相关文章所有 5 个版本

On a Variance Reduction Correction of the Temporal Difference for Policy Evaluation in the Stochastic Continuous Setting

Z Kobeissi, F Bach - 2022 - hal.umontpellier.fr

This paper deals with solving continuous time, state and action optimization problems in
stochastic settings, using reinforcement learning algorithms, and considers the policy …

被引用次数：2 相关文章

[PDF] mit.edu

An approach for nonlinear control design via approximate dynamic programming

CI Boussios - 1998 - dspace.mit.edu

This thesis proposes and studies a methodology for designing controllers for nonlinear
dynamic systems. We are interested in state feedback controllers (policies) that stabilize the …

被引用次数：9 相关文章所有 3 个版本