Scalable safe policy improvement via Monte Carlo tree search

A Castellini, F Bianchi, E Zorzi… - International …, 2023 - proceedings.mlr.press
Algorithms for safely improving policies are important to deploy reinforcement learning
approaches in real-world scenarios. In this work, we propose an algorithm, called MCTS …

Towards a formal account on negative latency

C Dubslaff, J Schulz, P Wienhöft, C Baier… - … Conference on Bridging …, 2023 - Springer
Low latency communication is a major challenge when humans have to be integrated into
cyber physical systems with mixed realities. Recently, the concept of negative latency has …

What Are the Odds? Improving the foundations of Statistical Model Checking

T Meggendorfer, M Weininger, P Wienhöft - arXiv preprint arXiv …, 2024 - arxiv.org
Markov decision processes (MDPs) are a fundamental model for decision making under
uncertainty. They exhibit non-deterministic choice as well as probabilistic uncertainty …

Towards alignment of Reinforcement Learning agents; for consideration of safety, robustness and fairness.

H Satija - 2024 - escholarship.mcgill.ca
Reinforcement Learning (RL) has emerged as the standard paradigm for sequential
decision-making and a framework for general intelligence. At its core, the RL problem is one …

[PDF][PDF] Safe Policy Improvement in POMDPs

MR Suilen, TD Simão, N Jansen - 2023 - repository.ubn.ru.nl
Reinforcement learning (RL) is the standard approach to solve sequential decision-making
problems when environment dynamics are unknown [9]. By interacting with the environment …