Large sequence models for sequential decision-making: a survey

M Wen, R Lin, H Wang, Y Yang, Y Wen, L Mai… - Frontiers of Computer …, 2023 - Springer
Transformer architectures have facilitated the development of large-scale and general-
purpose sequence models for prediction tasks in natural language processing and computer …

Decision transformer: Reinforcement learning via sequence modeling

L Chen, K Lu, A Rajeswaran, K Lee… - Advances in neural …, 2021 - proceedings.neurips.cc
We introduce a framework that abstracts Reinforcement Learning (RL) as a sequence
modeling problem. This allows us to draw upon the simplicity and scalability of the …

Transformers in reinforcement learning: a survey

P Agarwal, AA Rahman, PL St-Charles… - arXiv preprint arXiv …, 2023 - arxiv.org
Transformers have significantly impacted domains like natural language processing,
computer vision, and robotics, where they improve performance compared to other neural …

When do transformers shine in rl? decoupling memory from credit assignment

T Ni, M Ma, B Eysenbach… - Advances in Neural …, 2023 - proceedings.neurips.cc
Reinforcement learning (RL) algorithms face two distinct challenges: learning effective
representations of past and present observations, and determining how actions influence …

Recurrent model-free rl can be a strong baseline for many pomdps

T Ni, B Eysenbach, R Salakhutdinov - arXiv preprint arXiv:2110.05038, 2021 - arxiv.org
Many problems in RL, such as meta-RL, robust RL, generalization in RL, and temporal credit
assignment, can be cast as POMDPs. In theory, simply augmenting model-free RL with …

Counterfactual identifiability of bijective causal models

A Nasr-Esfahany, M Alizadeh… - … Conference on Machine …, 2023 - proceedings.mlr.press
We study counterfactual identifiability in causal models with bijective generation
mechanisms (BGM), a class that generalizes several widely-used causal models in the …

Mocoda: Model-based counterfactual data augmentation

S Pitis, E Creager, A Mandlekar… - Advances in Neural …, 2022 - proceedings.neurips.cc
The number of states in a dynamic process is exponential in the number of objects, making
reinforcement learning (RL) difficult in complex, multi-object domains. For agents to scale to …

Interpretable concept bottlenecks to align reinforcement learning agents

Q Delfosse, S Sztwiertnia, M Rothermel… - arXiv preprint arXiv …, 2024 - arxiv.org
Goal misalignment, reward sparsity and difficult credit assignment are only a few of the many
issues that make it difficult for deep reinforcement learning (RL) agents to learn optimal …

Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis

A Meulemans, S Schug… - Advances in Neural …, 2024 - proceedings.neurips.cc
To make reinforcement learning more sample efficient, we need better credit assignment
methods that measure an action's influence on future rewards. Building upon Hindsight …

On the link between conscious function and general intelligence in humans and machines

A Juliani, K Arulkumaran, S Sasai, R Kanai - arXiv preprint arXiv …, 2022 - arxiv.org
In popular media, there is often a connection drawn between the advent of awareness in
artificial agents and those same agents simultaneously achieving human or superhuman …