Finite-time analysis of whittle index based Q-learning for restless multi-armed bandits with neural network function approximation
Whittle index policy is a heuristic to the intractable restless multi-armed bandits (RMAB)
problem. Although it is provably asymptotically optimal, finding Whittle indices remains …
problem. Although it is provably asymptotically optimal, finding Whittle indices remains …
Constrained reinforcement learning using distributional representation for trustworthy quadrotor uav tracking control
Simultaneously accurate and reliable tracking control for quadrotors in complex dynamic
environments is challenging. The chaotic nature of aerodynamics, derived from drag forces …
environments is challenging. The chaotic nature of aerodynamics, derived from drag forces …
Target Network and Truncation Overcome the Deadly Triad in -Learning
learning with function approximation is one of the most empirically successful while
theoretically mysterious reinforcement learning (RL) algorithms and was identified in [RS …
theoretically mysterious reinforcement learning (RL) algorithms and was identified in [RS …
Anti-Jamming Attack Mixed Strategy for Formation Tracking Control via Game-Theoretical Reinforcement Learning
L Xue, B Ma, Y Wu, J Liu, C Mu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Communication, the basis for the unmanned aerial vehicle (UAV) to exchange information
(eg, displacement, velocity, or direction), plays a role in multi-UAV to perform formation …
(eg, displacement, velocity, or direction), plays a role in multi-UAV to perform formation …
Two-Timescale Q-Learning with Function Approximation in Zero-Sum Stochastic Games
We consider two-player zero-sum stochastic games and propose a two-timescale $ Q $-
learning algorithm with function approximation that is payoff-based, convergent, rational …
learning algorithm with function approximation that is payoff-based, convergent, rational …
An Improved Finite-time Analysis of Temporal Difference Learning with Deep Neural Networks
Temporal difference (TD) learning algorithms with neural network function parameterization
have well-established empirical success in many practical large-scale reinforcement …
have well-established empirical success in many practical large-scale reinforcement …
Finite-Time Error Analysis of Soft Q-Learning: Switching System Approach
Soft Q-learning is a variation of Q-learning designed to solve entropy regularized Markov
decision problems where an agent aims to maximize the entropy regularized value function …
decision problems where an agent aims to maximize the entropy regularized value function …