[HTML][HTML] Hierarchical reinforcement learning: A survey and open research challenges
Reinforcement learning (RL) allows an agent to solve sequential decision-making problems
by interacting with an environment in a trial-and-error fashion. When these environments are …
by interacting with an environment in a trial-and-error fashion. When these environments are …
Large sequence models for sequential decision-making: a survey
Transformer architectures have facilitated the development of large-scale and general-
purpose sequence models for prediction tasks in natural language processing and computer …
purpose sequence models for prediction tasks in natural language processing and computer …
Decision transformer: Reinforcement learning via sequence modeling
We introduce a framework that abstracts Reinforcement Learning (RL) as a sequence
modeling problem. This allows us to draw upon the simplicity and scalability of the …
modeling problem. This allows us to draw upon the simplicity and scalability of the …
Agent57: Outperforming the atari human benchmark
Atari games have been a long-standing benchmark in the reinforcement learning (RL)
community for the past decade. This benchmark was proposed to test general competency …
community for the past decade. This benchmark was proposed to test general competency …
[HTML][HTML] Toward human-in-the-loop AI: Enhancing deep reinforcement learning via real-time human guidance for autonomous driving
Due to its limited intelligence and abilities, machine learning is currently unable to handle
various situations thus cannot completely replace humans in real-world applications …
various situations thus cannot completely replace humans in real-world applications …
Recurrent model-free rl can be a strong baseline for many pomdps
Many problems in RL, such as meta-RL, robust RL, generalization in RL, and temporal credit
assignment, can be cast as POMDPs. In theory, simply augmenting model-free RL with …
assignment, can be cast as POMDPs. In theory, simply augmenting model-free RL with …
Rudder: Return decomposition for delayed rewards
JA Arjona-Medina, M Gillhofer… - Advances in …, 2019 - proceedings.neurips.cc
We propose RUDDER, a novel reinforcement learning approach for delayed rewards in
finite Markov decision processes (MDPs). In MDPs the Q-values are equal to the expected …
finite Markov decision processes (MDPs). In MDPs the Q-values are equal to the expected …
[HTML][HTML] Dual credit assignment processes underlie dopamine signals in a complex spatial environment
Animals frequently make decisions based on expectations of future reward (" values").
Values are updated by ongoing experience: places and choices that result in reward are …
Values are updated by ongoing experience: places and choices that result in reward are …
Counterfactual credit assignment in model-free reinforcement learning
Credit assignment in reinforcement learning is the problem of measuring an action's
influence on future rewards. In particular, this requires separating skill from luck, ie …
influence on future rewards. In particular, this requires separating skill from luck, ie …
Dense reward for free in reinforcement learning from human feedback
Reinforcement Learning from Human Feedback (RLHF) has been credited as the key
advance that has allowed Large Language Models (LLMs) to effectively follow instructions …
advance that has allowed Large Language Models (LLMs) to effectively follow instructions …