Harnessing Causality in Reinforcement Learning With Bagged Decision Times

D Gao, HY Lai, P Klasnja, SA Murphy - arXiv preprint arXiv:2410.14659, 2024 - arxiv.org
We consider reinforcement learning (RL) for a class of problems with bagged decision times.
A bag contains a finite sequence of consecutive decision times. The transition dynamics are …

Beyond Simple Sum of Delayed Rewards: Non-Markovian Reward Modeling for Reinforcement Learning

Y Tang, XQ Cai, JC Pang, Q Wu, YX Ding… - arXiv preprint arXiv …, 2024 - arxiv.org
Reinforcement Learning (RL) empowers agents to acquire various skills by learning from
reward signals. Unfortunately, designing high-quality instance-level rewards often demands …