Analysis of Off-Policy Multi-Step TD-Learning with Linear Function Approximation
D Lee - arXiv preprint arXiv:2402.15781, 2024 - arxiv.org
This paper analyzes multi-step TD-learning algorithms within thedeadly triad'scenario,
characterized by linear function approximation, off-policy learning, and bootstrapping. In …
characterized by linear function approximation, off-policy learning, and bootstrapping. In …