Large sequence models for sequential decision-making: a survey
Transformer architectures have facilitated the development of large-scale and general-
purpose sequence models for prediction tasks in natural language processing and computer …
purpose sequence models for prediction tasks in natural language processing and computer …
Decision transformer: Reinforcement learning via sequence modeling
We introduce a framework that abstracts Reinforcement Learning (RL) as a sequence
modeling problem. This allows us to draw upon the simplicity and scalability of the …
modeling problem. This allows us to draw upon the simplicity and scalability of the …
Transformers in reinforcement learning: a survey
P Agarwal, AA Rahman, PL St-Charles… - arXiv preprint arXiv …, 2023 - arxiv.org
Transformers have significantly impacted domains like natural language processing,
computer vision, and robotics, where they improve performance compared to other neural …
computer vision, and robotics, where they improve performance compared to other neural …
When do transformers shine in rl? decoupling memory from credit assignment
Reinforcement learning (RL) algorithms face two distinct challenges: learning effective
representations of past and present observations, and determining how actions influence …
representations of past and present observations, and determining how actions influence …
Recurrent model-free rl can be a strong baseline for many pomdps
Many problems in RL, such as meta-RL, robust RL, generalization in RL, and temporal credit
assignment, can be cast as POMDPs. In theory, simply augmenting model-free RL with …
assignment, can be cast as POMDPs. In theory, simply augmenting model-free RL with …
Counterfactual identifiability of bijective causal models
A Nasr-Esfahany, M Alizadeh… - … Conference on Machine …, 2023 - proceedings.mlr.press
We study counterfactual identifiability in causal models with bijective generation
mechanisms (BGM), a class that generalizes several widely-used causal models in the …
mechanisms (BGM), a class that generalizes several widely-used causal models in the …
Mocoda: Model-based counterfactual data augmentation
The number of states in a dynamic process is exponential in the number of objects, making
reinforcement learning (RL) difficult in complex, multi-object domains. For agents to scale to …
reinforcement learning (RL) difficult in complex, multi-object domains. For agents to scale to …
Interpretable concept bottlenecks to align reinforcement learning agents
Q Delfosse, S Sztwiertnia, M Rothermel… - arXiv preprint arXiv …, 2024 - arxiv.org
Goal misalignment, reward sparsity and difficult credit assignment are only a few of the many
issues that make it difficult for deep reinforcement learning (RL) agents to learn optimal …
issues that make it difficult for deep reinforcement learning (RL) agents to learn optimal …
Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis
A Meulemans, S Schug… - Advances in Neural …, 2024 - proceedings.neurips.cc
To make reinforcement learning more sample efficient, we need better credit assignment
methods that measure an action's influence on future rewards. Building upon Hindsight …
methods that measure an action's influence on future rewards. Building upon Hindsight …
On the link between conscious function and general intelligence in humans and machines
In popular media, there is often a connection drawn between the advent of awareness in
artificial agents and those same agents simultaneously achieving human or superhuman …
artificial agents and those same agents simultaneously achieving human or superhuman …