Variance penalized on-policy and off-policy actor-critic

M Gimelfarb, A Barreto, S Sanner… - Advances in Neural …, 2021 - proceedings.neurips.cc

Sample efficiency and risk-awareness are central to the development of practical
reinforcement learning (RL) for complex decision-making. The former can be addressed by …

被引用次数：24 相关文章所有 10 个版本

[PDF] arxiv.org

Safe option-critic: learning safety in the option-critic architecture

A Jain, K Khetarpal, D Precup - The Knowledge Engineering Review, 2021 - cambridge.org

Designing hierarchical reinforcement learning algorithms that exhibit safe behaviour is not
only vital for practical applications but also facilitates a better understanding of an agent's …

被引用次数：37 相关文章所有 7 个版本

[PDF] neurips.cc

A unifying framework of off-policy general value function evaluation

T Xu, Z Yang, Z Wang, Y Liang - Advances in Neural …, 2022 - proceedings.neurips.cc

Abstract General Value Function (GVF) is a powerful tool to represent both the {\em
predictive} and {\em retrospective} knowledge in reinforcement learning (RL). In practice …

被引用次数：3 相关文章所有 7 个版本

[PDF] arxiv.org

Robust reinforcement learning with distributional risk-averse formulation

P Clavier, S Allassonière, EL Pennec - arXiv preprint arXiv:2206.06841, 2022 - arxiv.org

Robust Reinforcement Learning tries to make predictions more robust to changes in the
dynamics or rewards of the system. This problem is particularly important when the …

被引用次数：8 相关文章所有 5 个版本

[PDF] arxiv.org

CAT: Caution Aware Transfer in Reinforcement Learning via Distributional Risk

MFEH Chehade, AS Bedi, A Zhang, H Zhu - arXiv preprint arXiv …, 2024 - arxiv.org

Transfer learning in reinforcement learning (RL) has become a pivotal strategy for improving
data efficiency in new, unseen tasks by utilizing knowledge from previously learned tasks …

[PDF] neurips.cc

Taylor TD-learning

M Garibbo, M Robeyns… - Advances in Neural …, 2024 - proceedings.neurips.cc

Many reinforcement learning approaches rely on temporal-difference (TD) learning to learn
a critic. However, TD-learning updates can be high variance. Here, we introduce a model …

被引用次数：3 相关文章所有 7 个版本

[PDF] arxiv.org

Adaptive Exploration for Data-Efficient General Value Function Evaluations

A Jain, JP Hanna, D Precup - arXiv preprint arXiv:2405.07838, 2024 - arxiv.org

General Value Functions (GVFs)(Sutton et al, 2011) are an established way to represent
predictive knowledge in reinforcement learning. Each GVF computes the expected return for …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

A unified off-policy evaluation approach for general value function

T Xu, Z Yang, Z Wang, Y Liang - arXiv preprint arXiv:2107.02711, 2021 - arxiv.org

General Value Function (GVF) is a powerful tool to represent both the {\em predictive} and
{\em retrospective} knowledge in reinforcement learning (RL). In practice, often multiple …

被引用次数：3 相关文章所有 3 个版本

Shapley-Optimized Reinforcement Learning for Human-Machine Collaboration Policy

J Zhang, Y Niu, W He, C Jin, C Wang - International Conference on …, 2024 - Springer

Human-machine collaboration is a promising training framework aimed at learning optimal
strategies in high-cost exploration scenarios. However, such work is challenging. On one …

[PDF] umich.edu

Reinforcement Learning based Sequential and Robust Bayesian Optimal Experimental Design

W Shen - 2023 - deepblue.lib.umich.edu

Optimal experimental design (OED) is a statistical approach aimed at designing experiments
in order to extract maximum information from them. It entails carefully selecting experimental …

被引用次数：2 相关文章