[HTML][HTML] Exploration–Exploitation Mechanisms in Recurrent Neural Networks and Human Learners in Restless Bandit Problems
A key feature of animal and human decision-making is to balance the exploration of
unknown options for information gain (directed exploration) versus selecting known options …
unknown options for information gain (directed exploration) versus selecting known options …
Human-level reinforcement learning performance of recurrent neural networks is linked to hyperperseveration, not directed exploration
A key feature of animal and human decision-making is to balance exploring unknown
options for information gain (directed exploration) versus exploiting known options for …
options for information gain (directed exploration) versus exploiting known options for …
Impact of multi-armed bandit strategies on deep recurrent reinforcement learning
V Zangirolami, M Borrotti - arXiv preprint arXiv:2310.08331, 2023 - arxiv.org
Incomplete knowledge of the environment leads an agent to make decisions under
uncertainty. One of the major dilemmas in Reinforcement Learning (RL) where an …
uncertainty. One of the major dilemmas in Reinforcement Learning (RL) where an …
Program-Based Strategy Induction for Reinforcement Learning
Typical models of learning assume incremental estimation of continuously-varying decision
variables like expected rewards. However, this class of models fails to capture more …
variables like expected rewards. However, this class of models fails to capture more …
Bridging Computational Neuroscience and Machine Learning on Non-Stationary Multi-Armed Bandits
Fast adaptation to changes in the environment requires both natural and artificial agents to
be able to dynamically tune an exploration-exploitation trade-off during learning. This trade …
be able to dynamically tune an exploration-exploitation trade-off during learning. This trade …
[PDF][PDF] RL or not RL? Parsing the processes that support human reward-based learning.
AGE Collins - files.osf.io
Reinforcement Learning (RL) algorithms have had tremendous success accounting for
reward-based learning across species, in both behavior and brain. In particular, simple …
reward-based learning across species, in both behavior and brain. In particular, simple …
Bio-inspired meta-learning for active exploration during non-stationary multi-armed bandit tasks
G Velentzas, C Tzafestas… - 2017 Intelligent Systems …, 2017 - ieeexplore.ieee.org
Fast adaptation to changes in the environment requires agents (animals, robots and
simulated artefacts) to be able to dynamically tune an exploration-exploitation trade-off …
simulated artefacts) to be able to dynamically tune an exploration-exploitation trade-off …
[HTML][HTML] Parameter and model recovery of reinforcement learning models for restless bandit problems
L Danwitz, D Mathar, E Smith, D Tuzsus… - Computational Brain & …, 2022 - Springer
Multi-armed restless bandit tasks are regularly applied in psychology and cognitive
neuroscience to assess exploration and exploitation behavior in structured environments …
neuroscience to assess exploration and exploitation behavior in structured environments …
[HTML][HTML] Harnessing the flexibility of neural networks to predict dynamic theoretical parameters underlying human choice behavior
Reinforcement learning (RL) models are used extensively to study human behavior. These
rely on normative models of behavior and stress interpretability over predictive capabilities …
rely on normative models of behavior and stress interpretability over predictive capabilities …
Connecting exploration, generalization, and planning in correlated trees
Human reinforcement learning (RL) is characterized by different challenges. Exploration has
been studied extensively in multi-armed bandits, while planning has been investigated in …
been studied extensively in multi-armed bandits, while planning has been investigated in …