Exploration in deep reinforcement learning: From single-agent to multiagent domain

J Hao, T Yang, H Tang, C Bai, J Liu… - … on Neural Networks …, 2023 - ieeexplore.ieee.org
Deep reinforcement learning (DRL) and deep multiagent reinforcement learning (MARL)
have achieved significant success across a wide range of domains, including game artificial …

[PDF][PDF] An analysis of quantile temporal-difference learning

M Rowland, R Munos, MG Azar, Y Tang, G Ostrovski… - 2023 - jmlr.org
We analyse quantile temporal-difference learning (QTD), a distributional reinforcement
learning algorithm that has proven to be a key component in several successful large-scale …

RiskQ: risk-sensitive multi-agent reinforcement learning value factorization

S Shen, C Ma, C Li, W Liu, Y Fu… - Advances in Neural …, 2023 - proceedings.neurips.cc
Multi-agent systems are characterized by environmental uncertainty, varying policies of
agents, and partial observability, which result in significant risks. In the context of Multi-Agent …

Uncertainty-aware reinforcement learning for risk-sensitive player evaluation in sports game

G Liu, Y Luo, O Schulte… - Advances in Neural …, 2022 - proceedings.neurips.cc
A major task of sports analytics is player evaluation. Previous methods commonly measured
the impact of players' actions on desirable outcomes (eg, goals or winning) without …

The statistical benefits of quantile temporal-difference learning for value estimation

M Rowland, Y Tang, C Lyle, R Munos… - International …, 2023 - proceedings.mlr.press
We study the problem of temporal-difference-based policy evaluation in reinforcement
learning. In particular, we analyse the use of a distributional reinforcement learning …

Advanced reinforcement learning and its connections with brain neuroscience

C Fan, L Yao, J Zhang, Z Zhen, X Wu - Research, 2023 - spj.science.org
In recent years, brain science and neuroscience have greatly propelled the innovation of
computer science. In particular, knowledge from the neurobiology and neuropsychology of …

CVaR-Constrained Policy Optimization for Safe Reinforcement Learning

Q Zhang, S Leng, X Ma, Q Liu, X Wang… - … on Neural Networks …, 2024 - ieeexplore.ieee.org
Current constrained reinforcement learning (RL) methods guarantee constraint satisfaction
only in expectation, which is inadequate for safety-critical decision problems. Since a …

Deep non-crossing quantiles through the partial derivative

A Brando, BS Center… - International …, 2022 - proceedings.mlr.press
Quantile Regression (QR) provides a way to approximate a single conditional quantile. To
have a more informative description of the conditional distribution, QR can be merged with …

Monotonic quantile network for worst-case offline reinforcement learning

C Bai, T Xiao, Z Zhu, L Wang, F Zhou… - … on Neural Networks …, 2022 - ieeexplore.ieee.org
A key challenge in offline reinforcement learning (RL) is how to ensure the learned offline
policy is safe, especially in safety-critical domains. In this article, we focus on learning a …

Distributional reinforcement learning with monotonic splines

Y Luo, G Liu, H Duan, O Schulte… - … Conference on Learning …, 2021 - openreview.net
Distributional Reinforcement Learning (RL) differs from traditional RL by estimating the
distribution over returns to capture the intrinsic uncertainty of MDPs. One key challenge in …