iLSTD: Eligibility traces and convergence analysis

X Xu, L Zuo, Z Huang - Information sciences, 2014 - Elsevier

In recent years, the research on reinforcement learning (RL) has focused on function
approximation in learning prediction and control of Markov decision processes (MDPs). The …

被引用次数：227 相关文章所有 6 个版本

[PDF] bookfusion.com

[图书][B] Algorithms for reinforcement learning

C Szepesvári - 2022 - books.google.com

Reinforcement learning is a learning paradigm concerned with learning to control a system
so as to maximize a numerical performance measure that expresses a long-term objective …

被引用次数：2243 相关文章所有 24 个版本

[PDF] uliege.be

[图书][B] Reinforcement learning and dynamic programming using function approximators

L Busoniu, R Babuska, B De Schutter, D Ernst - 2017 - taylorfrancis.com

From household appliances to applications in robotics, engineered systems involving
complex dynamics can only be as effective as the algorithms that control them. While …

被引用次数：1297 相关文章所有 12 个版本

[PDF] jmlr.org

[PDF][PDF] Policy evaluation with temporal differences: A survey and comparison

C Dann, G Neumann, J Peters - The Journal of Machine Learning …, 2014 - jmlr.org

Policy evaluation is an essential step in most reinforcement learning approaches. It yields a
value function, the quality assessment of states for a given policy, which can be used in a …

被引用次数：305 相关文章所有 21 个版本

[PDF] mlr.press

Stochastic variance reduction methods for policy evaluation

SS Du, J Chen, L Li, L Xiao… - … Conference on Machine …, 2017 - proceedings.mlr.press

Policy evaluation is concerned with estimating the value function that predicts long-term
values of states under a given policy. It is a crucial step in many reinforcement-learning …

被引用次数：218 相关文章所有 6 个版本

[PDF] hadovanhasselt.com

Reinforcement learning in continuous state and action spaces

H Van Hasselt - Reinforcement Learning: State-of-the-Art, 2012 - Springer

Many traditional reinforcement-learning algorithms have been designed for problems with
small finite state and action spaces. Learning in such discrete problems can been difficult …

被引用次数：306 相关文章所有 10 个版本

[PDF] neurips.cc

Error propagation for approximate policy and value iteration

A Farahmand, C Szepesvári… - Advances in neural …, 2010 - proceedings.neurips.cc

We address the question of how the approximation error/Bellman residual at each iteration
of the Approximate Policy/Value Iteration algorithms influences the quality of the resulted …

被引用次数：283 相关文章所有 22 个版本

[PDF] neurips.cc

Loss dynamics of temporal difference reinforcement learning

B Bordelon, P Masset, H Kuo… - Advances in Neural …, 2024 - proceedings.neurips.cc

Reinforcement learning has been successful across several applications in which agents
have to learn to act in environments with sparse feedback. However, despite this empirical …

被引用次数：5 相关文章所有 5 个版本

[PDF] hal.science

Least-squares methods for policy iteration

L Buşoniu, A Lazaric, M Ghavamzadeh… - … learning: state-of-the-art, 2012 - Springer

Approximate reinforcement learning deals with the essential problem of applying
reinforcement learning in large and continuous state-action spaces, by using function …

被引用次数：37 相关文章所有 22 个版本

[PDF] aaai.org

Accelerated gradient temporal difference learning

Y Pan, A White, M White - Proceedings of the AAAI Conference on …, 2017 - ojs.aaai.org

The family of temporal difference (TD) methods span a spectrum from computationally frugal
linear methods like TD (λ) to data efficient least squares methods. Least square methods …

被引用次数：35 相关文章所有 13 个版本