Using reward machines for high-level task specification and decomposition in reinforcement learning

Human-in-the-loop reinforcement learning: A survey and position on requirements, challenges, and opportunities

CO Retzlaff, S Das, C Wayllace, P Mousavi… - Journal of Artificial …, 2024 - jair.org

Artificial intelligence (AI) and especially reinforcement learning (RL) have the potential to
enable agents to learn and perform tasks autonomously with superhuman performance …

被引用次数：41 相关文章所有 3 个版本

[PDF] sciencedirect.com

Secure-by-construction synthesis of cyber-physical systems

S Liu, A Trivedi, X Yin, M Zamani - Annual Reviews in Control, 2022 - Elsevier

Correct-by-construction synthesis is a cornerstone of the confluence of formal methods and
control theory towards designing safety-critical systems. Instead of following the time-tested …

被引用次数：50 相关文章所有 11 个版本

[PDF] jair.org Full View

Reward machines: Exploiting reward function structure in reinforcement learning

RT Icarte, TQ Klassen, R Valenzano… - Journal of Artificial …, 2022 - jair.org

Reinforcement learning (RL) methods usually treat reward functions as black boxes. As
such, these methods must extensively interact with the environment in order to discover …

被引用次数：254 相关文章所有 10 个版本

[PDF] ijcai.org

[PDF][PDF] LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning.

A Camacho, RT Icarte, TQ Klassen, RA Valenzano… - IJCAI, 2019 - ijcai.org

Abstract In Reinforcement Learning (RL), an agent is guided by the rewards it receives from
the reward function. Unfortunately, it may take many interactions with the environment to …

被引用次数：268 相关文章所有 4 个版本

[PDF] neurips.cc

On the expressivity of markov reward

D Abel, W Dabney, A Harutyunyan… - Advances in …, 2021 - proceedings.neurips.cc

Reward is the driving force for reinforcement-learning agents. This paper is dedicated to
understanding the expressivity of reward as a way to capture tasks that we would want an …

被引用次数：104 相关文章所有 12 个版本

[PDF] aaai.org

The perils of trial-and-error reward design: misdesign through overfitting and invalid task specifications

S Booth, WB Knox, J Shah, S Niekum, P Stone… - Proceedings of the …, 2023 - ojs.aaai.org

In reinforcement learning (RL), a reward function that aligns exactly with a task's true
performance metric is often necessarily sparse. For example, a true task metric might …

被引用次数：66 相关文章所有 9 个版本

[PDF] acm.org Full View

Toward verified artificial intelligence

SA Seshia, D Sadigh, SS Sastry - Communications of the ACM, 2022 - dl.acm.org

Toward verified artificial intelligence Page 1 46 COMMUNICATIONS OF THE ACM | JULY
2022 | VOL. 65 | NO. 7 contributed articles ILL US TRA TION B Y PETER CRO W THER A …

被引用次数：412 相关文章所有 8 个版本

[PDF] nsf.gov

[PDF][PDF] Explainable reinforcement learning via reward decomposition

Z Juozapaitis, A Koul, A Fern, M Erwig… - IJCAI/ECAI Workshop on …, 2019 - par.nsf.gov

We study reward decomposition for explaining the decisions of reinforcement learning (RL)
agents. The approach decomposes rewards into sums of semantically meaningful reward …

被引用次数：246 相关文章所有 7 个版本

[PDF] arxiv.org

A survey on interpretable reinforcement learning

C Glanois, P Weng, M Zimmer, D Li, T Yang, J Hao… - Machine Learning, 2024 - Springer

Although deep reinforcement learning has become a promising machine learning approach
for sequential decision-making problems, it is still not mature enough for high-stake domains …

被引用次数：99 相关文章所有 3 个版本

[PDF] neurips.cc

Compositional reinforcement learning from logical specifications

K Jothimurugan, S Bansal… - Advances in Neural …, 2021 - proceedings.neurips.cc

We study the problem of learning control policies for complex tasks given by logical
specifications. Recent approaches automatically generate a reward function from a given …

被引用次数：98 相关文章所有 15 个版本