Stochastic policy gradient ascent in reproducing kernel hilbert spaces

S Paternain, M Calvo-Fullana… - … on Automatic Control, 2022 - ieeexplore.ieee.org

In this article, we study the design of controllers in the context of stochastic optimal control
under the assumption that the model of the system is not available. This is, we aim to control …

被引用次数：125 相关文章所有 4 个版本

[PDF] arxiv.org

Communication-efficient policy gradient methods for distributed reinforcement learning

T Chen, K Zhang, GB Giannakis… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

This article deals with distributed policy optimization in reinforcement learning, which
involves a central controller and a group of learners. In particular, two typical settings …

被引用次数：79 相关文章所有 6 个版本

[PDF] acm.org

The Path to Defence: A Roadmap to Characterising Data Poisoning Attacks on Victim Models

T Chaalan, S Pang, J Kamruzzaman, I Gondal… - ACM Computing …, 2024 - dl.acm.org

Data Poisoning Attacks (DPA) represent a sophisticated technique aimed at distorting the
training data of machine learning models, thereby manipulating their behavior. This process …

被引用次数：6 相关文章所有 2 个版本

[PDF] luizchamon.com

Learning safe policies via primal-dual methods

S Paternain, M Calvo-Fullana… - 2019 IEEE 58th …, 2019 - ieeexplore.ieee.org

In this paper, we study the learning of safe policies in the setting of reinforcement learning
problems. This is, we aim to control a Markov Decision Process (MDP) of which we do not …

被引用次数：27 相关文章所有 5 个版本

[PDF] jmlr.org

On the sample complexity and metastability of heavy-tailed policy search in continuous control

AS Bedi, A Parayil, J Zhang, M Wang… - Journal of Machine …, 2024 - jmlr.org

Reinforcement learning is a framework for interactive decision-making with incentives
sequentially revealed across time without a system dynamics model. Due to its scaling to …

被引用次数：13 相关文章所有 3 个版本

[PDF] mdpi.com

Autonomous Driving Control for Passing Unsignalized Intersections Using the Semantic Segmentation Technique

J Tsai, YT Chang, ZY Chen, Z You - Electronics, 2024 - mdpi.com

Autonomous driving in urban areas is challenging because it requires understanding
vehicle movements, traffic rules, map topologies and unknown environments in the highly …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs

S Rozada, D Ding, AG Marques, A Ribeiro - arXiv preprint arXiv …, 2024 - arxiv.org

We study the problem of computing deterministic optimal policies for constrained Markov
decision processes (MDPs) with continuous state and action spaces, which are widely …

被引用次数：1 相关文章所有 3 个版本

[PDF] aaai.org

Exploring Gradient Explosion in Generative Adversarial Imitation Learning: A Probabilistic Perspective

W Wang, Y Zhu, Y Zhou, C Shen, J Tang, Z Xu… - Proceedings of the …, 2024 - ojs.aaai.org

Generative Adversarial Imitation Learning (GAIL) stands as a cornerstone approach in
imitation learning. This paper investigates the gradient explosion in two types of GAIL: GAIL …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Multilinear Tensor Low-Rank Approximation for Policy-Gradient Methods in Reinforcement Learning

S Rozada, HT Wai, AG Marques - arXiv preprint arXiv:2501.04879, 2025 - arxiv.org

Reinforcement learning (RL) aims to estimate the action to take given a (time-varying) state,
with the goal of maximizing a cumulative reward function. Predominantly, there are two …

Towards delivering a coherent self-contained explanation of proximal policy optimization

D Bick - 2021 - fse.studenttheses.ub.rug.nl

Reinforcement Learning (RL), and these days particularly Deep Reinforcement Learning
(DRL), is concerned with the development, study, and application of algorithms that are …

被引用次数：7 相关文章所有 2 个版本