Observe and look further: Achieving consistent performance on atari

T Pohlen, B Piot, T Hester, MG Azar, D Horgan… - arXiv preprint arXiv …, 2018 - arxiv.org
Despite significant advances in the field of deep Reinforcement Learning (RL), today's
algorithms still fail to learn human-level policies consistently over a set of diverse tasks such …

Towards characterizing divergence in deep q-learning

J Achiam, E Knight, P Abbeel - arXiv preprint arXiv:1903.08894, 2019 - arxiv.org
Deep Q-Learning (DQL), a family of temporal difference algorithms for control, employs three
techniques collectively known as thedeadly triad'in reinforcement learning: bootstrapping, off …

Constrained deep q-learning gradually approaching ordinary q-learning

S Ohnishi, E Uchibe, Y Yamaguchi… - Frontiers in …, 2019 - frontiersin.org
A deep Q network (DQN)(Mnih et al.,) is an extension of Q learning, which is a typical deep
reinforcement learning method. In DQN, a Q function expresses all action values under all …

Vehicle to grid frequency regulation capacity optimal scheduling for battery swapping station using deep Q-network

X Wang, J Wang, J Liu - IEEE Transactions on Industrial …, 2020 - ieeexplore.ieee.org
Battery swapping stations (BSSs) are ideal candidates for fast frequency regulation services
(FFRS) due to their large battery stock capacity. In addition, BSSs can precharge batteries …

Two-stage WECC composite load modeling: A double deep Q-learning networks approach

X Wang, Y Wang, D Shi, J Wang… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
With the increasing complexity of modern power system, conventional dynamic load
modeling with ZIP and induction motors (ZIP+ IM) is no longer adequate to address the …

Grac: Self-guided and self-regularized actor-critic

L Shao, Y You, M Yan, S Yuan… - Conference on Robot …, 2022 - proceedings.mlr.press
Deep reinforcement learning (DRL) algorithms have successfully been demonstrated on a
range of challenging decision making and control tasks. One dominant component of recent …

Adaptively calibrated critic estimates for deep reinforcement learning

N Dorka, T Welschehold, J Bödecker… - IEEE Robotics and …, 2022 - ieeexplore.ieee.org
Accurate value estimates are important for off-policy reinforcement learning. Algorithms
based on temporal difference learning typically are prone to an over-or underestimation bias …

Stabilizing deep Q-learning with Q-graph-based bounds

S Hoppe, M Giftthaler, R Krug… - … International Journal of …, 2023 - journals.sagepub.com
State-of-the art deep reinforcement learning has enabled autonomous agents to learn
complex strategies from scratch on many problems including continuous control tasks. Deep …

Deep Q‐learning: A robust control approach

B Varga, B Kulcsár… - International Journal of …, 2023 - Wiley Online Library
This work aims at constructing a bridge between robust control theory and reinforcement
learning. Although, reinforcement learning has shown admirable results in complex control …

Temporal Consistency-Based Loss Function for Both Deep Q-Networks and Deep Deterministic Policy Gradients for Continuous Actions

C Kim - Symmetry, 2021 - mdpi.com
Artificial intelligence (AI) techniques in power grid control and energy management in
building automation require both deep Q-networks (DQNs) and deep deterministic policy …