Non-crossing quantile regression for distributional reinforcement learning

J Hao, T Yang, H Tang, C Bai, J Liu… - … on Neural Networks …, 2023 - ieeexplore.ieee.org

Deep reinforcement learning (DRL) and deep multiagent reinforcement learning (MARL)
have achieved significant success across a wide range of domains, including game artificial …

被引用次数：97 相关文章所有 7 个版本

[PDF] jmlr.org

[PDF][PDF] An analysis of quantile temporal-difference learning

M Rowland, R Munos, MG Azar, Y Tang, G Ostrovski… - 2023 - jmlr.org

We analyse quantile temporal-difference learning (QTD), a distributional reinforcement
learning algorithm that has proven to be a key component in several successful large-scale …

被引用次数：21 相关文章所有 3 个版本

[PDF] neurips.cc

RiskQ: risk-sensitive multi-agent reinforcement learning value factorization

S Shen, C Ma, C Li, W Liu, Y Fu… - Advances in Neural …, 2023 - proceedings.neurips.cc

Multi-agent systems are characterized by environmental uncertainty, varying policies of
agents, and partial observability, which result in significant risks. In the context of Multi-Agent …

被引用次数：3 相关文章所有 7 个版本

[PDF] neurips.cc

Uncertainty-aware reinforcement learning for risk-sensitive player evaluation in sports game

G Liu, Y Luo, O Schulte… - Advances in Neural …, 2022 - proceedings.neurips.cc

A major task of sports analytics is player evaluation. Previous methods commonly measured
the impact of players' actions on desirable outcomes (eg, goals or winning) without …

被引用次数：11 相关文章所有 7 个版本

[PDF] mlr.press

The statistical benefits of quantile temporal-difference learning for value estimation

M Rowland, Y Tang, C Lyle, R Munos… - International …, 2023 - proceedings.mlr.press

We study the problem of temporal-difference-based policy evaluation in reinforcement
learning. In particular, we analyse the use of a distributional reinforcement learning …

被引用次数：7 相关文章所有 6 个版本

[PDF] science.org

Advanced reinforcement learning and its connections with brain neuroscience

C Fan, L Yao, J Zhang, Z Zhen, X Wu - Research, 2023 - spj.science.org

In recent years, brain science and neuroscience have greatly propelled the innovation of
computer science. In particular, knowledge from the neurobiology and neuropsychology of …

被引用次数：13 相关文章所有 6 个版本

CVaR-Constrained Policy Optimization for Safe Reinforcement Learning

Q Zhang, S Leng, X Ma, Q Liu, X Wang… - … on Neural Networks …, 2024 - ieeexplore.ieee.org

Current constrained reinforcement learning (RL) methods guarantee constraint satisfaction
only in expectation, which is inadequate for safety-critical decision problems. Since a …

被引用次数：6 相关文章所有 3 个版本

[PDF] mlr.press

Deep non-crossing quantiles through the partial derivative

A Brando, BS Center… - International …, 2022 - proceedings.mlr.press

Quantile Regression (QR) provides a way to approximate a single conditional quantile. To
have a more informative description of the conditional distribution, QR can be merged with …

被引用次数：15 相关文章所有 3 个版本

Monotonic quantile network for worst-case offline reinforcement learning

C Bai, T Xiao, Z Zhu, L Wang, F Zhou… - … on Neural Networks …, 2022 - ieeexplore.ieee.org

A key challenge in offline reinforcement learning (RL) is how to ensure the learned offline
policy is safe, especially in safety-critical domains. In this article, we focus on learning a …

被引用次数：8 相关文章所有 4 个版本

[PDF] openreview.net

Distributional reinforcement learning with monotonic splines

Y Luo, G Liu, H Duan, O Schulte… - … Conference on Learning …, 2021 - openreview.net

Distributional Reinforcement Learning (RL) differs from traditional RL by estimating the
distribution over returns to capture the intrinsic uncertainty of MDPs. One key challenge in …

被引用次数：17 相关文章所有 4 个版本