Zeroth-order deterministic policy gradient

F Huang, S Gao, J Pei, H Huang - Journal of Machine Learning Research, 2022 - jmlr.org

In the paper, we propose a class of accelerated zeroth-order and first-order momentum
methods for both nonconvex mini-optimization and minimax-optimization. Specifically, we …

被引用次数：58 相关文章所有 9 个版本

[PDF] arxiv.org

Examining average and discounted reward optimality criteria in reinforcement learning

V Dewanto, M Gallagher - Australasian Joint Conference on Artificial …, 2022 - Springer

In reinforcement learning (RL), the goal is to obtain an optimal policy, for which the optimality
criterion is fundamentally important. Two major optimality criteria are average and …

被引用次数：15 相关文章所有 5 个版本

[PDF] arxiv.org

Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs

S Rozada, D Ding, AG Marques, A Ribeiro - arXiv preprint arXiv …, 2024 - arxiv.org

We study the problem of computing deterministic optimal policies for constrained Markov
decision processes (MDPs) with continuous state and action spaces, which are widely …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Memory-Efficient Gradient Unrolling for Large-Scale Bi-level Optimization

Q Shen, Y Wang, Z Yang, X Li, H Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Bi-level optimization (BO) has become a fundamental mathematical framework for
addressing hierarchical machine learning problems. As deep learning models continue to …

[PDF] arxiv.org

Learning Optimal Deterministic Policies with Stochastic Policy Gradients

A Montenegro, M Mussi, AM Metelli… - arXiv preprint arXiv …, 2024 - arxiv.org

Policy gradient (PG) methods are successful approaches to deal with continuous
reinforcement learning (RL) problems. They learn stochastic parametric (hyper) policies by …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Distributed cooperative multi-agent reinforcement learning with directed coordination graph

G Jing, H Bai, J George, A Chakrabortty… - 2022 American …, 2022 - ieeexplore.ieee.org

Existing distributed cooperative multi-agent reinforcement learning (MARL) frameworks
usually assume undirected coordination graphs and communication graphs, while …

被引用次数：5 相关文章所有 4 个版本

[PDF] arxiv.org

Model-free learning of optimal deterministic resource allocations in wireless systems via action-space exploration

H Hashmi, DS Kalogerias - 2021 IEEE 31st International …, 2021 - ieeexplore.ieee.org

Wireless systems resource allocation refers to perpetual and challenging nonconvex
constrained optimization tasks, which are especially timely in modern communications and …

被引用次数：5 相关文章所有 5 个版本

[PDF] arxiv.org

Unmanned Vehicles in 6G Networks: A Unifying Treatment of Problems, Formulations, and Tools

W Hurst, S Evmorfos, A Petropulu, Y Mostofi - arXiv preprint arXiv …, 2024 - arxiv.org

Unmanned Vehicles (UVs) functioning as autonomous agents are anticipated to play a
crucial role in the 6th Generation of wireless networks. Their seamless integration, cost …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Model-Free Learning of Two-Stage Beamformers for Passive IRS-Aided Network Design

H Hashmi, S Pougkakiotis… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Electronically tunable metasurfaces, or Intelligent Reflecting Surfaces (IRSs), are a popular
technology for achieving high spectral efficiency in modern wireless systems by shaping …

被引用次数：2 相关文章所有 6 个版本

[PDF] arxiv.org

Distributed Multi-Agent Reinforcement Learning Based on Graph-Induced Local Value Functions

G Jing, H Bai, J George, A Chakrabortty… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Achieving distributed reinforcement learning (RL) for large-scale cooperative multi-agent
systems (MASs) is challenging because:(i) each agent has access to only limited …

被引用次数：1 相关文章所有 4 个版本