Reinforcement learning algorithms with function approximation: Recent advances and applications

X Xu, L Zuo, Z Huang - Information sciences, 2014 - Elsevier
In recent years, the research on reinforcement learning (RL) has focused on function
approximation in learning prediction and control of Markov decision processes (MDPs). The …

[图书][B] Algorithms for reinforcement learning

C Szepesvári - 2022 - books.google.com
Reinforcement learning is a learning paradigm concerned with learning to control a system
so as to maximize a numerical performance measure that expresses a long-term objective …

[图书][B] Reinforcement learning and dynamic programming using function approximators

L Busoniu, R Babuska, B De Schutter, D Ernst - 2017 - taylorfrancis.com
From household appliances to applications in robotics, engineered systems involving
complex dynamics can only be as effective as the algorithms that control them. While …

[PDF][PDF] Policy evaluation with temporal differences: A survey and comparison

C Dann, G Neumann, J Peters - The Journal of Machine Learning …, 2014 - jmlr.org
Policy evaluation is an essential step in most reinforcement learning approaches. It yields a
value function, the quality assessment of states for a given policy, which can be used in a …

Stochastic variance reduction methods for policy evaluation

SS Du, J Chen, L Li, L Xiao… - … Conference on Machine …, 2017 - proceedings.mlr.press
Policy evaluation is concerned with estimating the value function that predicts long-term
values of states under a given policy. It is a crucial step in many reinforcement-learning …

Reinforcement learning in continuous state and action spaces

H Van Hasselt - Reinforcement Learning: State-of-the-Art, 2012 - Springer
Many traditional reinforcement-learning algorithms have been designed for problems with
small finite state and action spaces. Learning in such discrete problems can been difficult …

Error propagation for approximate policy and value iteration

A Farahmand, C Szepesvári… - Advances in neural …, 2010 - proceedings.neurips.cc
We address the question of how the approximation error/Bellman residual at each iteration
of the Approximate Policy/Value Iteration algorithms influences the quality of the resulted …

Loss dynamics of temporal difference reinforcement learning

B Bordelon, P Masset, H Kuo… - Advances in Neural …, 2024 - proceedings.neurips.cc
Reinforcement learning has been successful across several applications in which agents
have to learn to act in environments with sparse feedback. However, despite this empirical …

Least-squares methods for policy iteration

L Buşoniu, A Lazaric, M Ghavamzadeh… - … learning: state-of-the-art, 2012 - Springer
Approximate reinforcement learning deals with the essential problem of applying
reinforcement learning in large and continuous state-action spaces, by using function …

Accelerated gradient temporal difference learning

Y Pan, A White, M White - Proceedings of the AAAI Conference on …, 2017 - ojs.aaai.org
The family of temporal difference (TD) methods span a spectrum from computationally frugal
linear methods like TD (λ) to data efficient least squares methods. Least square methods …