Safe policies for reinforcement learning via primal-dual methods

S Paternain, M Calvo-Fullana… - … on Automatic Control, 2022 - ieeexplore.ieee.org
In this article, we study the design of controllers in the context of stochastic optimal control
under the assumption that the model of the system is not available. This is, we aim to control …

Communication-efficient policy gradient methods for distributed reinforcement learning

T Chen, K Zhang, GB Giannakis… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
This article deals with distributed policy optimization in reinforcement learning, which
involves a central controller and a group of learners. In particular, two typical settings …

The Path to Defence: A Roadmap to Characterising Data Poisoning Attacks on Victim Models

T Chaalan, S Pang, J Kamruzzaman, I Gondal… - ACM Computing …, 2024 - dl.acm.org
Data Poisoning Attacks (DPA) represent a sophisticated technique aimed at distorting the
training data of machine learning models, thereby manipulating their behavior. This process …

Learning safe policies via primal-dual methods

S Paternain, M Calvo-Fullana… - 2019 IEEE 58th …, 2019 - ieeexplore.ieee.org
In this paper, we study the learning of safe policies in the setting of reinforcement learning
problems. This is, we aim to control a Markov Decision Process (MDP) of which we do not …

On the sample complexity and metastability of heavy-tailed policy search in continuous control

AS Bedi, A Parayil, J Zhang, M Wang… - Journal of Machine …, 2024 - jmlr.org
Reinforcement learning is a framework for interactive decision-making with incentives
sequentially revealed across time without a system dynamics model. Due to its scaling to …

Autonomous Driving Control for Passing Unsignalized Intersections Using the Semantic Segmentation Technique

J Tsai, YT Chang, ZY Chen, Z You - Electronics, 2024 - mdpi.com
Autonomous driving in urban areas is challenging because it requires understanding
vehicle movements, traffic rules, map topologies and unknown environments in the highly …

Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs

S Rozada, D Ding, AG Marques, A Ribeiro - arXiv preprint arXiv …, 2024 - arxiv.org
We study the problem of computing deterministic optimal policies for constrained Markov
decision processes (MDPs) with continuous state and action spaces, which are widely …

Exploring Gradient Explosion in Generative Adversarial Imitation Learning: A Probabilistic Perspective

W Wang, Y Zhu, Y Zhou, C Shen, J Tang, Z Xu… - Proceedings of the …, 2024 - ojs.aaai.org
Generative Adversarial Imitation Learning (GAIL) stands as a cornerstone approach in
imitation learning. This paper investigates the gradient explosion in two types of GAIL: GAIL …

Multilinear Tensor Low-Rank Approximation for Policy-Gradient Methods in Reinforcement Learning

S Rozada, HT Wai, AG Marques - arXiv preprint arXiv:2501.04879, 2025 - arxiv.org
Reinforcement learning (RL) aims to estimate the action to take given a (time-varying) state,
with the goal of maximizing a cumulative reward function. Predominantly, there are two …

Towards delivering a coherent self-contained explanation of proximal policy optimization

D Bick - 2021 - fse.studenttheses.ub.rug.nl
Reinforcement Learning (RL), and these days particularly Deep Reinforcement Learning
(DRL), is concerned with the development, study, and application of algorithms that are …