Counterfactual credit assignment in model-free reinforcement learning

M Wen, R Lin, H Wang, Y Yang, Y Wen, L Mai… - Frontiers of Computer …, 2023 - Springer

Transformer architectures have facilitated the development of large-scale and general-
purpose sequence models for prediction tasks in natural language processing and computer …

被引用次数：32 相关文章所有 6 个版本

[PDF] neurips.cc

Decision transformer: Reinforcement learning via sequence modeling

L Chen, K Lu, A Rajeswaran, K Lee… - Advances in neural …, 2021 - proceedings.neurips.cc

We introduce a framework that abstracts Reinforcement Learning (RL) as a sequence
modeling problem. This allows us to draw upon the simplicity and scalability of the …

被引用次数：1711 相关文章所有 11 个版本

[PDF] arxiv.org

Transformers in reinforcement learning: a survey

P Agarwal, AA Rahman, PL St-Charles… - arXiv preprint arXiv …, 2023 - arxiv.org

Transformers have significantly impacted domains like natural language processing,
computer vision, and robotics, where they improve performance compared to other neural …

被引用次数：16 相关文章所有 2 个版本

[PDF] neurips.cc

When do transformers shine in rl? decoupling memory from credit assignment

T Ni, M Ma, B Eysenbach… - Advances in Neural …, 2023 - proceedings.neurips.cc

Reinforcement learning (RL) algorithms face two distinct challenges: learning effective
representations of past and present observations, and determining how actions influence …

被引用次数：32 相关文章所有 8 个版本

[PDF] arxiv.org

Recurrent model-free rl can be a strong baseline for many pomdps

T Ni, B Eysenbach, R Salakhutdinov - arXiv preprint arXiv:2110.05038, 2021 - arxiv.org

Many problems in RL, such as meta-RL, robust RL, generalization in RL, and temporal credit
assignment, can be cast as POMDPs. In theory, simply augmenting model-free RL with …

被引用次数：117 相关文章所有 4 个版本

[PDF] mlr.press

Counterfactual identifiability of bijective causal models

A Nasr-Esfahany, M Alizadeh… - … Conference on Machine …, 2023 - proceedings.mlr.press

We study counterfactual identifiability in causal models with bijective generation
mechanisms (BGM), a class that generalizes several widely-used causal models in the …

被引用次数：20 相关文章所有 8 个版本

[PDF] neurips.cc

Mocoda: Model-based counterfactual data augmentation

S Pitis, E Creager, A Mandlekar… - Advances in Neural …, 2022 - proceedings.neurips.cc

The number of states in a dynamic process is exponential in the number of objects, making
reinforcement learning (RL) difficult in complex, multi-object domains. For agents to scale to …

被引用次数：41 相关文章所有 6 个版本

[PDF] arxiv.org

Interpretable concept bottlenecks to align reinforcement learning agents

Q Delfosse, S Sztwiertnia, M Rothermel… - arXiv preprint arXiv …, 2024 - arxiv.org

Goal misalignment, reward sparsity and difficult credit assignment are only a few of the many
issues that make it difficult for deep reinforcement learning (RL) agents to learn optimal …

被引用次数：12 相关文章所有 3 个版本

[PDF] neurips.cc

Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis

A Meulemans, S Schug… - Advances in Neural …, 2024 - proceedings.neurips.cc

To make reinforcement learning more sample efficient, we need better credit assignment
methods that measure an action's influence on future rewards. Building upon Hindsight …

被引用次数：3 相关文章所有 7 个版本

[PDF] arxiv.org

On the link between conscious function and general intelligence in humans and machines

A Juliani, K Arulkumaran, S Sasai, R Kanai - arXiv preprint arXiv …, 2022 - arxiv.org

In popular media, there is often a connection drawn between the advent of awareness in
artificial agents and those same agents simultaneously achieving human or superhuman …

被引用次数：34 相关文章所有 4 个版本