Reinforcement learning in continuous time and space
K Doya - Neural computation, 2000 - direct.mit.edu
This article presents a reinforcement learning framework for continuous-time dynamical
systems without a priori discretization of time, state, and action. Basedonthe Hamilton-Jacobi …
systems without a priori discretization of time, state, and action. Basedonthe Hamilton-Jacobi …
[图书][B] Approximate solutions to Markov decision processes
GJ Gordon - 1999 - search.proquest.com
One of the basic problems of machine learning is deciding how to act in an uncertain world.
For example, if I want my robot to bring me a cup of coffee, it must be able to compute the …
For example, if I want my robot to bring me a cup of coffee, it must be able to compute the …
Control frequency adaptation via action persistence in batch reinforcement learning
AM Metelli, F Mazzolini, L Bisi… - International …, 2020 - proceedings.mlr.press
The choice of the control frequency of a system has a relevant impact on the ability of
reinforcement learning algorithms to learn a highly performing policy. In this paper, we …
reinforcement learning algorithms to learn a highly performing policy. In this paper, we …
Hamilton-Jacobi deep Q-Learning for deterministic continuous-time systems with lipschitz continuous controls
In this paper, we propose Q-learning algorithms for continuous-time deterministic optimal
control problems with Lipschitz continuous controls. A new class of Hamilton-Jacobi …
control problems with Lipschitz continuous controls. A new class of Hamilton-Jacobi …
Value-gradient iteration with quadratic approximate value functions
We propose a method for designing policies for convex stochastic control problems
characterized by random linear dynamics and convex stage cost. We consider policies that …
characterized by random linear dynamics and convex stage cost. We consider policies that …
[PDF][PDF] Utilizing the natural gradient in temporal difference reinforcement learning with eligibility traces
The policy gradient method is a promising approach in reinforcement learning (RL) for
optimizing action policy parameters in order to maximize average reward. The natural …
optimizing action policy parameters in order to maximize average reward. The natural …
Reinforcement learning by value gradients
M Fairbank - arXiv preprint arXiv:0803.3539, 2008 - arxiv.org
The concept of the value-gradient is introduced and developed in the context of
reinforcement learning. It is shown that by learning the value-gradients exploration or …
reinforcement learning. It is shown that by learning the value-gradients exploration or …
[图书][B] Exploiting environment configurability in reinforcement learning
AM Metelli - 2022 - books.google.com
In recent decades, Reinforcement Learning (RL) has emerged as an effective approach to
address complex control tasks. In a Markov Decision Process (MDP), the framework typically …
address complex control tasks. In a Markov Decision Process (MDP), the framework typically …
On a Variance Reduction Correction of the Temporal Difference for Policy Evaluation in the Stochastic Continuous Setting
Z Kobeissi, F Bach - 2022 - hal.umontpellier.fr
This paper deals with solving continuous time, state and action optimization problems in
stochastic settings, using reinforcement learning algorithms, and considers the policy …
stochastic settings, using reinforcement learning algorithms, and considers the policy …
An approach for nonlinear control design via approximate dynamic programming
CI Boussios - 1998 - dspace.mit.edu
This thesis proposes and studies a methodology for designing controllers for nonlinear
dynamic systems. We are interested in state feedback controllers (policies) that stabilize the …
dynamic systems. We are interested in state feedback controllers (policies) that stabilize the …