Safe policy improvement with baseline bootstrapping in factored environments

T Badings, TD Simão, M Suilen, N Jansen - International Journal on …, 2023 - Springer

This position paper reflects on the state-of-the-art in decision-making under uncertainty. A
classical assumption is that probabilities can sufficiently capture all uncertainty in a system …

被引用次数：13 相关文章所有 8 个版本

Confidence-aware reinforcement learning for self-driving cars

Z Cao, S Xu, H Peng, D Yang… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

Reinforcement learning (RL) can be used to design smart driving policies in complex
situations where traditional methods cannot. However, they are frequently black-box in …

被引用次数：66 相关文章所有 3 个版本

[PDF] arxiv.org

Neural simplex architecture

DT Phan, R Grosu, N Jansen, N Paoletti… - NASA Formal Methods …, 2020 - Springer

Abstract We present the Neural Simplex Architecture (NSA), a new approach to runtime
assurance that provides safety guarantees for neural controllers (obtained eg using …

被引用次数：79 相关文章所有 22 个版本

[PDF] aaai.org

Safe policy improvement for POMDPs via finite-state controllers

TD Simão, M Suilen, N Jansen - … of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org

We study safe policy improvement (SPI) for partially observable Markov decision processes
(POMDPs). SPI is an offline reinforcement learning (RL) problem that assumes access to (1) …

被引用次数：12 相关文章所有 6 个版本

[PDF] ru.nl

[PDF][PDF] Alwayssafe: Reinforcement learning without safety constraint violations during training

TD Simão, N Jansen, MTJ Spaan - 2021 - repository.ubn.ru.nl

Deploying reinforcement learning (RL) involves major concerns around safety. Engineering
a reward signal that allows the agent to maximize its performance while remaining safe is …

被引用次数：50 相关文章所有 14 个版本

[PDF] mlr.press

Scalable safe policy improvement via Monte Carlo tree search

A Castellini, F Bianchi, E Zorzi… - International …, 2023 - proceedings.mlr.press

Algorithms for safely improving policies are important to deploy reinforcement learning
approaches in real-world scenarios. In this work, we propose an algorithm, called MCTS …

被引用次数：7 相关文章所有 13 个版本

[PDF] univr.it

Partially Observable Monte Carlo Planning with state variable constraints for mobile robot navigation

A Castellini, E Marchesini, A Farinelli - Engineering Applications of Artificial …, 2021 - Elsevier

Autonomous mobile robots employed in industrial applications often operate in complex and
uncertain environments. In this paper we propose an approach based on an extension of …

被引用次数：19 相关文章所有 4 个版本

[HTML] nature.com

[HTML][HTML] Efficient and scalable reinforcement learning for large-scale network control

C Ma, A Li, Y Du, H Dong, Y Yang - Nature Machine Intelligence, 2024 - nature.com

The primary challenge in the development of large-scale artificial intelligence (AI) systems
lies in achieving scalable decision-making—extending the AI models while maintaining …

Safe policy improvement with soft baseline bootstrapping

K Nadjahi, R Laroche… - Machine Learning and …, 2020 - Springer

Abstract Batch Reinforcement Learning (Batch RL) consists in training a policy using
trajectories collected with another policy, called the behavioural policy. Safe policy …

被引用次数：37 相关文章所有 7 个版本

[PDF] arxiv.org

Identify, estimate and bound the uncertainty of reinforcement learning for autonomous driving

W Zhou, Z Cao, N Deng, K Jiang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Deep reinforcement learning (DRL) has emerged as a promising approach for developing
more intelligent autonomous vehicles (AVs). A typical DRL application on AVs is to train a …

被引用次数：8 相关文章所有 5 个版本