[HTML][HTML] Exploration–Exploitation Mechanisms in Recurrent Neural Networks and Human Learners in Restless Bandit Problems

D Tuzsus, A Brands, I Pappas, J Peters - Computational Brain & Behavior, 2024 - Springer
A key feature of animal and human decision-making is to balance the exploration of
unknown options for information gain (directed exploration) versus selecting known options …

Human-level reinforcement learning performance of recurrent neural networks is linked to hyperperseveration, not directed exploration

D Tuzsus, I Pappas, J Peters - bioRxiv, 2023 - biorxiv.org
A key feature of animal and human decision-making is to balance exploring unknown
options for information gain (directed exploration) versus exploiting known options for …

Impact of multi-armed bandit strategies on deep recurrent reinforcement learning

V Zangirolami, M Borrotti - arXiv preprint arXiv:2310.08331, 2023 - arxiv.org
Incomplete knowledge of the environment leads an agent to make decisions under
uncertainty. One of the major dilemmas in Reinforcement Learning (RL) where an …

Program-Based Strategy Induction for Reinforcement Learning

CG Correa, TL Griffiths, ND Daw - arXiv preprint arXiv:2402.16668, 2024 - arxiv.org
Typical models of learning assume incremental estimation of continuously-varying decision
variables like expected rewards. However, this class of models fails to capture more …

Bridging Computational Neuroscience and Machine Learning on Non-Stationary Multi-Armed Bandits

G Velentzas, C Tzafestas, M Khamassi - bioRxiv, 2017 - biorxiv.org
Fast adaptation to changes in the environment requires both natural and artificial agents to
be able to dynamically tune an exploration-exploitation trade-off during learning. This trade …

[PDF][PDF] RL or not RL? Parsing the processes that support human reward-based learning.

AGE Collins - files.osf.io
Reinforcement Learning (RL) algorithms have had tremendous success accounting for
reward-based learning across species, in both behavior and brain. In particular, simple …

Bio-inspired meta-learning for active exploration during non-stationary multi-armed bandit tasks

G Velentzas, C Tzafestas… - 2017 Intelligent Systems …, 2017 - ieeexplore.ieee.org
Fast adaptation to changes in the environment requires agents (animals, robots and
simulated artefacts) to be able to dynamically tune an exploration-exploitation trade-off …

[HTML][HTML] Parameter and model recovery of reinforcement learning models for restless bandit problems

L Danwitz, D Mathar, E Smith, D Tuzsus… - Computational Brain & …, 2022 - Springer
Multi-armed restless bandit tasks are regularly applied in psychology and cognitive
neuroscience to assess exploration and exploitation behavior in structured environments …

[HTML][HTML] Harnessing the flexibility of neural networks to predict dynamic theoretical parameters underlying human choice behavior

Y Ger, E Nachmani, L Wolf… - PLoS Computational …, 2024 - journals.plos.org
Reinforcement learning (RL) models are used extensively to study human behavior. These
rely on normative models of behavior and stress interpretability over predictive capabilities …

Connecting exploration, generalization, and planning in correlated trees

T Ludwig, CM Wu, E Schulz - … of the Annual Meeting of the …, 2022 - escholarship.org
Human reinforcement learning (RL) is characterized by different challenges. Exploration has
been studied extensively in multi-armed bandits, while planning has been investigated in …