The causal structure and computational value of narratives

J Chen, AM Bornstein - Trends in Cognitive Sciences, 2024 - cell.com
Many human behavioral and brain imaging studies have used narratively structured stimuli
(eg, written, audio, or audiovisual stories) to better emulate real-world experience in the …

Dual credit assignment processes underlie dopamine signals in a complex spatial environment

TA Krausz, AE Comrie, AE Kahn, LM Frank, ND Daw… - Neuron, 2023 - cell.com
Animals frequently make decisions based on expectations of future reward (" values").
Values are updated by ongoing experience: places and choices that result in reward are …

Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis

A Meulemans, S Schug… - Advances in Neural …, 2024 - proceedings.neurips.cc
To make reinforcement learning more sample efficient, we need better credit assignment
methods that measure an action's influence on future rewards. Building upon Hindsight …

Maximum state entropy exploration using predecessor and successor representations

AK Jain, L Lehnert, I Rish… - Advances in Neural …, 2024 - proceedings.neurips.cc
Animals have a developed ability to explore that aids them in important tasks such as
locating food, exploring for shelter, and finding misplaced items. These exploration skills …

A survey of temporal credit assignment in deep reinforcement learning

E Pignatelli, J Ferret, M Geist, T Mesnard… - arXiv preprint arXiv …, 2023 - arxiv.org
The Credit Assignment Problem (CAP) refers to the longstanding challenge of
Reinforcement Learning (RL) agents to associate actions with their long-term …

The pitfalls of regularization in off-policy TD learning

G Manek, JZ Kolter - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Temporal Difference (TD) learning is ubiquitous in reinforcement learning, where it is often
combined with off-policy sampling and function approximation. Unfortunately learning with …

Forethought and hindsight in credit assignment

V Chelu, D Precup… - Advances in Neural …, 2020 - proceedings.neurips.cc
We address the problem of credit assignment in reinforcement learning and explore
fundamental questions regarding the way in which an agent can best use additional …

Optimal control of nonlinear system based on deterministic policy gradient with eligibility traces

J Rao, J Wang, J Xu, S Zhao - Nonlinear Dynamics, 2023 - Springer
Optimal control of nonlinear systems by using adaptive dynamic programming (ADP)
methods is always a hot topic in recent years. However, unknown nonlinear systems with …

An information-theoretic perspective on credit assignment in reinforcement learning

D Arumugam, P Henderson, PL Bacon - arXiv preprint arXiv:2103.06224, 2021 - arxiv.org
How do we formalize the challenge of credit assignment in reinforcement learning?
Common intuition would draw attention to reward sparsity as a key contributor to difficult …

Learning expected emphatic traces for deep RL

R Jiang, S Zhang, V Chelu, A White… - Proceedings of the AAAI …, 2022 - ojs.aaai.org
Off-policy sampling and experience replay are key for improving sample efficiency and
scaling model-free temporal difference learning methods. When combined with function …