Reinforcement learning in continuous time and space

K Doya - Neural computation, 2000 - direct.mit.edu
This article presents a reinforcement learning framework for continuous-time dynamical
systems without a priori discretization of time, state, and action. Basedonthe Hamilton-Jacobi …

[图书][B] Approximate solutions to Markov decision processes

GJ Gordon - 1999 - search.proquest.com
One of the basic problems of machine learning is deciding how to act in an uncertain world.
For example, if I want my robot to bring me a cup of coffee, it must be able to compute the …

Control frequency adaptation via action persistence in batch reinforcement learning

AM Metelli, F Mazzolini, L Bisi… - International …, 2020 - proceedings.mlr.press
The choice of the control frequency of a system has a relevant impact on the ability of
reinforcement learning algorithms to learn a highly performing policy. In this paper, we …

Hamilton-Jacobi deep Q-Learning for deterministic continuous-time systems with lipschitz continuous controls

J Kim, J Shin, I Yang - Journal of Machine Learning Research, 2021 - jmlr.org
In this paper, we propose Q-learning algorithms for continuous-time deterministic optimal
control problems with Lipschitz continuous controls. A new class of Hamilton-Jacobi …

Value-gradient iteration with quadratic approximate value functions

A Yang, S Boyd - Annual Reviews in Control, 2023 - Elsevier
We propose a method for designing policies for convex stochastic control problems
characterized by random linear dynamics and convex stage cost. We consider policies that …

[PDF][PDF] Utilizing the natural gradient in temporal difference reinforcement learning with eligibility traces

T Morimura, E Uchibe, K Doya - International Symposium on …, 2005 - researchgate.net
The policy gradient method is a promising approach in reinforcement learning (RL) for
optimizing action policy parameters in order to maximize average reward. The natural …

Reinforcement learning by value gradients

M Fairbank - arXiv preprint arXiv:0803.3539, 2008 - arxiv.org
The concept of the value-gradient is introduced and developed in the context of
reinforcement learning. It is shown that by learning the value-gradients exploration or …

[图书][B] Exploiting environment configurability in reinforcement learning

AM Metelli - 2022 - books.google.com
In recent decades, Reinforcement Learning (RL) has emerged as an effective approach to
address complex control tasks. In a Markov Decision Process (MDP), the framework typically …

On a Variance Reduction Correction of the Temporal Difference for Policy Evaluation in the Stochastic Continuous Setting

Z Kobeissi, F Bach - 2022 - hal.umontpellier.fr
This paper deals with solving continuous time, state and action optimization problems in
stochastic settings, using reinforcement learning algorithms, and considers the policy …

An approach for nonlinear control design via approximate dynamic programming

CI Boussios - 1998 - dspace.mit.edu
This thesis proposes and studies a methodology for designing controllers for nonlinear
dynamic systems. We are interested in state feedback controllers (policies) that stabilize the …