Accelerated zeroth-order and first-order momentum methods from mini to minimax optimization

F Huang, S Gao, J Pei, H Huang - Journal of Machine Learning Research, 2022 - jmlr.org
In the paper, we propose a class of accelerated zeroth-order and first-order momentum
methods for both nonconvex mini-optimization and minimax-optimization. Specifically, we …

Examining average and discounted reward optimality criteria in reinforcement learning

V Dewanto, M Gallagher - Australasian Joint Conference on Artificial …, 2022 - Springer
In reinforcement learning (RL), the goal is to obtain an optimal policy, for which the optimality
criterion is fundamentally important. Two major optimality criteria are average and …

Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs

S Rozada, D Ding, AG Marques, A Ribeiro - arXiv preprint arXiv …, 2024 - arxiv.org
We study the problem of computing deterministic optimal policies for constrained Markov
decision processes (MDPs) with continuous state and action spaces, which are widely …

Memory-Efficient Gradient Unrolling for Large-Scale Bi-level Optimization

Q Shen, Y Wang, Z Yang, X Li, H Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Bi-level optimization (BO) has become a fundamental mathematical framework for
addressing hierarchical machine learning problems. As deep learning models continue to …

Learning Optimal Deterministic Policies with Stochastic Policy Gradients

A Montenegro, M Mussi, AM Metelli… - arXiv preprint arXiv …, 2024 - arxiv.org
Policy gradient (PG) methods are successful approaches to deal with continuous
reinforcement learning (RL) problems. They learn stochastic parametric (hyper) policies by …

Distributed cooperative multi-agent reinforcement learning with directed coordination graph

G Jing, H Bai, J George, A Chakrabortty… - 2022 American …, 2022 - ieeexplore.ieee.org
Existing distributed cooperative multi-agent reinforcement learning (MARL) frameworks
usually assume undirected coordination graphs and communication graphs, while …

Model-free learning of optimal deterministic resource allocations in wireless systems via action-space exploration

H Hashmi, DS Kalogerias - 2021 IEEE 31st International …, 2021 - ieeexplore.ieee.org
Wireless systems resource allocation refers to perpetual and challenging nonconvex
constrained optimization tasks, which are especially timely in modern communications and …

Unmanned Vehicles in 6G Networks: A Unifying Treatment of Problems, Formulations, and Tools

W Hurst, S Evmorfos, A Petropulu, Y Mostofi - arXiv preprint arXiv …, 2024 - arxiv.org
Unmanned Vehicles (UVs) functioning as autonomous agents are anticipated to play a
crucial role in the 6th Generation of wireless networks. Their seamless integration, cost …

Model-Free Learning of Two-Stage Beamformers for Passive IRS-Aided Network Design

H Hashmi, S Pougkakiotis… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Electronically tunable metasurfaces, or Intelligent Reflecting Surfaces (IRSs), are a popular
technology for achieving high spectral efficiency in modern wireless systems by shaping …

Distributed Multi-Agent Reinforcement Learning Based on Graph-Induced Local Value Functions

G Jing, H Bai, J George, A Chakrabortty… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Achieving distributed reinforcement learning (RL) for large-scale cooperative multi-agent
systems (MASs) is challenging because:(i) each agent has access to only limited …