Finite-time analysis of whittle index based Q-learning for restless multi-armed bandits with neural network function approximation

G Xiong, J Li - Advances in Neural Information Processing …, 2023 - proceedings.neurips.cc
Whittle index policy is a heuristic to the intractable restless multi-armed bandits (RMAB)
problem. Although it is provably asymptotically optimal, finding Whittle indices remains …

Constrained reinforcement learning using distributional representation for trustworthy quadrotor uav tracking control

Y Wang, D Boyle - IEEE Transactions on Automation Science …, 2024 - ieeexplore.ieee.org
Simultaneously accurate and reliable tracking control for quadrotors in complex dynamic
environments is challenging. The chaotic nature of aerodynamics, derived from drag forces …

Target Network and Truncation Overcome the Deadly Triad in -Learning

Z Chen, JP Clarke, ST Maguluri - SIAM Journal on Mathematics of Data …, 2023 - SIAM
learning with function approximation is one of the most empirically successful while
theoretically mysterious reinforcement learning (RL) algorithms and was identified in [RS …

Anti-Jamming Attack Mixed Strategy for Formation Tracking Control via Game-Theoretical Reinforcement Learning

L Xue, B Ma, Y Wu, J Liu, C Mu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Communication, the basis for the unmanned aerial vehicle (UAV) to exchange information
(eg, displacement, velocity, or direction), plays a role in multi-UAV to perform formation …

Two-Timescale Q-Learning with Function Approximation in Zero-Sum Stochastic Games

Z Chen, K Zhang, E Mazumdar, A Ozdaglar… - arXiv preprint arXiv …, 2023 - arxiv.org
We consider two-player zero-sum stochastic games and propose a two-timescale $ Q $-
learning algorithm with function approximation that is payoff-based, convergent, rational …

An Improved Finite-time Analysis of Temporal Difference Learning with Deep Neural Networks

Z Ke, Z Wen, J Zhang - arXiv preprint arXiv:2405.04017, 2024 - arxiv.org
Temporal difference (TD) learning algorithms with neural network function parameterization
have well-established empirical success in many practical large-scale reinforcement …

Finite-Time Error Analysis of Soft Q-Learning: Switching System Approach

N Jeong, D Lee - arXiv preprint arXiv:2403.06366, 2024 - arxiv.org
Soft Q-learning is a variation of Q-learning designed to solve entropy regularized Markov
decision problems where an agent aims to maximize the entropy regularized value function …