[PDF][PDF] Policy learning with constraints in model-free reinforcement learning: A survey

Y Liu, A Halev, X Liu - The 30th international joint conference on artificial …, 2021 - par.nsf.gov
Reinforcement Learning (RL) algorithms have had tremendous success in simulated
domains. These algorithms, however, often cannot be directly applied to physical systems …

Understanding the impact of entropy on policy optimization

Z Ahmed, N Le Roux, M Norouzi… - … on machine learning, 2019 - proceedings.mlr.press
Entropy regularization is commonly used to improve policy optimization in reinforcement
learning. It is believed to help with exploration by encouraging the selection of more …

An application of deep reinforcement learning and vendor-managed inventory in perishable supply chain management

N Mohamadi, STA Niaki, M Taher… - Engineering Applications of …, 2024 - Elsevier
This article delves into the challenging supply chain management domain, explicitly
addressing the intricate issue of perishable inventory allocation within a two-echelon supply …

IRS-aided energy-efficient secure WBAN transmission based on deep reinforcement learning

L Xiao, S Hong, S Xu, H Yang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Wireless body area networks (WBANs) are vulnerable to active eavesdropping that
simultaneously perform sniffing and jamming to raise the sensor transmit power, and thus …

Optimizing adaptive notifications in mobile health interventions systems: reinforcement learning from a data-driven behavioral simulator

S Wang, C Zhang, B Kröse, H van Hoof - Journal of medical systems, 2021 - Springer
Mobile health (mHealth) intervention systems can employ adaptive strategies to interact with
users. Instead of designing such complex strategies manually, reinforcement learning (RL) …

Reinforcement learning architecture for cyber–physical–social AI: state-of-the-art and perspectives

X Li, P Wang, X Jin, Q Jiang, W Zhou, S Yao - Artificial Intelligence Review, 2023 - Springer
As the extension of cyber–physical systems (CPSs), cyber–physical–social systems
(CPSSs) seamlessly integrate cyber space, physical space, and social space. CPSS provide …

Benchmarking actor-critic deep reinforcement learning algorithms for robotics control with action constraints

K Kasaura, S Miura, T Kozuno… - IEEE Robotics and …, 2023 - ieeexplore.ieee.org
This study presents a benchmark for evaluating action-constrained reinforcement learning
(RL) algorithms. In action-constrained RL, each action taken by the learning system must …

Striving for simplicity and performance in off-policy DRL: Output normalization and non-uniform sampling

C Wang, Y Wu, Q Vuong… - … Conference on Machine …, 2020 - proceedings.mlr.press
We aim to develop off-policy DRL algorithms that not only exceed state-of-the-art
performance but are also simple and minimalistic. For standard continuous control …

Pseudo-labeled auto-curriculum learning for semi-supervised keypoint localization

C Wang, S Jin, Y Guan, W Liu, C Qian, P Luo… - arXiv preprint arXiv …, 2022 - arxiv.org
Localizing keypoints of an object is a basic visual problem. However, supervised learning of
a keypoint localization network often requires a large amount of data, which is expensive …

KnowGPT: Knowledge Graph based Prompting for Large Language Models

Q Zhang, J Dong, H Chen, D Zha, Z Yu… - The Thirty-eighth …, 2024 - openreview.net
Large Language Models (LLMs) have demonstrated remarkable capabilities in many real-
world applications. Nonetheless, LLMs are often criticized for their tendency to produce …