Accelerated gradient temporal difference learning

Y Pan, A White, M White - Proceedings of the AAAI Conference on …, 2017 - ojs.aaai.org
The family of temporal difference (TD) methods span a spectrum from computationally frugal
linear methods like TD (λ) to data efficient least squares methods. Least square methods …

Meta-descent for online, continual prediction

A Jacobsen, M Schlegel, C Linke, T Degris… - Proceedings of the …, 2019 - ojs.aaai.org
This paper investigates different vector step-size adaptation approaches for non-stationary
online, continual prediction problems. Vanilla stochastic gradient descent can be …

[图书][B] Generalization, optimization, diverse generation: insights and advances in the use of bootstrapping in deep neural networks

E Bengio - 2022 - search.proquest.com
This thesis investigates the use of bootstrapping in Temporal Difference (TD) learning, a
central mechanism in reinforcement learning (RL), when applied to deep neural networks. I …

Accelerated Gradient Algorithms for Robust Temporal Difference Learning

DJ Meyer - 2021 - mediatum.ub.tum.de
This thesis deals with linearly approximated gradient temporal difference learning. The
applicability of the underlying cost functions are discussed and investigated with respect to …

Accelerated algorithms for temporal difference learning methods

A Rankawat - 2023 - papyrus.bib.umontreal.ca
The central idea of this thesis is to understand the notion of acceleration in stochastic
approximation algorithms. Specifically, we attempt to answer the question: How does …

Improving Sample Efficiency of Online Temporal Difference Learning

Y Pan - 2021 - era.library.ualberta.ca
A common scientific challenge for putting a reinforcement learning agent into practice is how
to improve sample efficiency as much as possible with limited computational or memory …

Vector Step-size Adaptation for Continual, Online Prediction

A Jacobsen - 2019 - era.library.ualberta.ca
In this thesis, we investigate different vector step-size adaptation approaches for continual,
online prediction problems. Vanilla stochastic gradient descent can be considerably …