Neural temporal difference and q learning provably converge to global optima

Finite-time analysis of whittle index based Q-learning for restless multi-armed bandits with neural network function approximation

G Xiong, J Li - Advances in Neural Information Processing …, 2023 - proceedings.neurips.cc

Whittle index policy is a heuristic to the intractable restless multi-armed bandits (RMAB)
problem. Although it is provably asymptotically optimal, finding Whittle indices remains …

被引用次数：10 相关文章所有 8 个版本

Constrained reinforcement learning using distributional representation for trustworthy quadrotor uav tracking control

Y Wang, D Boyle - IEEE Transactions on Automation Science …, 2024 - ieeexplore.ieee.org

Simultaneously accurate and reliable tracking control for quadrotors in complex dynamic
environments is challenging. The chaotic nature of aerodynamics, derived from drag forces …

被引用次数：3 相关文章

[PDF] arxiv.org

Target Network and Truncation Overcome the Deadly Triad in -Learning

Z Chen, JP Clarke, ST Maguluri - SIAM Journal on Mathematics of Data …, 2023 - SIAM

learning with function approximation is one of the most empirically successful while
theoretically mysterious reinforcement learning (RL) algorithms and was identified in [RS …

被引用次数：19 相关文章所有 3 个版本

Anti-Jamming Attack Mixed Strategy for Formation Tracking Control via Game-Theoretical Reinforcement Learning

L Xue, B Ma, Y Wu, J Liu, C Mu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Communication, the basis for the unmanned aerial vehicle (UAV) to exchange information
(eg, displacement, velocity, or direction), plays a role in multi-UAV to perform formation …

[PDF] arxiv.org