Scalable safe policy improvement via Monte Carlo tree search
Algorithms for safely improving policies are important to deploy reinforcement learning
approaches in real-world scenarios. In this work, we propose an algorithm, called MCTS …
approaches in real-world scenarios. In this work, we propose an algorithm, called MCTS …
Towards a formal account on negative latency
Low latency communication is a major challenge when humans have to be integrated into
cyber physical systems with mixed realities. Recently, the concept of negative latency has …
cyber physical systems with mixed realities. Recently, the concept of negative latency has …
What Are the Odds? Improving the foundations of Statistical Model Checking
Markov decision processes (MDPs) are a fundamental model for decision making under
uncertainty. They exhibit non-deterministic choice as well as probabilistic uncertainty …
uncertainty. They exhibit non-deterministic choice as well as probabilistic uncertainty …
Towards alignment of Reinforcement Learning agents; for consideration of safety, robustness and fairness.
H Satija - 2024 - escholarship.mcgill.ca
Reinforcement Learning (RL) has emerged as the standard paradigm for sequential
decision-making and a framework for general intelligence. At its core, the RL problem is one …
decision-making and a framework for general intelligence. At its core, the RL problem is one …
[PDF][PDF] Safe Policy Improvement in POMDPs
Reinforcement learning (RL) is the standard approach to solve sequential decision-making
problems when environment dynamics are unknown [9]. By interacting with the environment …
problems when environment dynamics are unknown [9]. By interacting with the environment …