所有版本 - 学术资源搜索

Adaptive temporal difference learning with linear function approximation

T Sun, H Shen, T Chen, D Li - IEEE Transactions on Pattern …, 2021 - ieeexplore.ieee.org

This paper revisits the temporal difference (TD) learning algorithm for the policy evaluation
tasks in reinforcement learning. Typically, the performance of TD (0) and TD () is very …

被引用次数：31 相关文章

Adaptive Temporal Difference Learning with Linear Function Approximation

T Sun, H Shen, T Chen, D Li - arXiv preprint arXiv:2002.08537, 2020 - arxiv.org

This paper revisits the temporal difference (TD) learning algorithm for the policy evaluation
tasks in reinforcement learning. Typically, the performance of TD (0) and TD ($\lambda $) is …

Adaptive Temporal Difference Learning With Linear Function Approximation

T Sun, H Shen, T Chen, D Li - IEEE transactions on …, 2022 - pubmed.ncbi.nlm.nih.gov

This paper revisits the temporal difference (TD) learning algorithm for the policy evaluation
tasks in reinforcement learning. Typically, the performance of TD (0) and TD (λ) is very …

Adaptive Temporal Difference Learning With Linear Function Approximation

T Sun, H Shen, T Chen, D Li - IEEE Transactions on Pattern Analysis …, 2022 - computer.org

This paper revisits the temporal difference (TD) learning algorithm for the policy evaluation
tasks in reinforcement learning. Typically, the performance of TD (0) and TD (-) is very …

Adaptive Temporal Difference Learning with Linear Function Approximation

T Sun, H Shen, T Chen, D Li - arXiv e-prints, 2020 - ui.adsabs.harvard.edu

This paper revisits the temporal difference (TD) learning algorithm for the policy evaluation
tasks in reinforcement learning. Typically, the performance of TD (0) and TD ($\lambda $) is …

Adaptive Temporal Difference Learning With Linear Function Approximation.

T Sun, H Shen, T Chen, D Li - IEEE Transactions on Pattern …, 2022 - europepmc.org

This paper revisits the temporal difference (TD) learning algorithm for the policy evaluation
tasks in reinforcement learning. Typically, the performance of TD (0) and TD (λ) is very …