Abstractions of general reinforcement learning

D Abel, A Barreto, B Van Roy… - Advances in …, 2024 - proceedings.neurips.cc

In a standard view of the reinforcement learning problem, an agent's goal is to efficiently
identify a policy that maximizes long-term reward. However, this perspective is based on a …

被引用次数：44 相关文章所有 8 个版本

[PDF] mlr.press

Settling the reward hypothesis

M Bowling, JD Martin, D Abel… - … on Machine Learning, 2023 - proceedings.mlr.press

The reward hypothesis posits that," all of what we mean by goals and purposes can be well
thought of as maximization of the expected value of the cumulative sum of a received scalar …

被引用次数：31 相关文章所有 8 个版本

[PDF] neurips.cc

Self-predictive universal AI

E Catt, J Grau-Moya, M Hutter… - Advances in …, 2023 - proceedings.neurips.cc

Reinforcement Learning (RL) algorithms typically utilize learning and/or planning
techniques to derive effective policies. The integration of both approaches has proven to be …

被引用次数：1 相关文章所有 3 个版本

[PDF] mlr.press

Conditions on Preference Relations that Guarantee the Existence of Optimal Policies

JC Carr, P Panangaden… - … Conference on Artificial …, 2024 - proceedings.mlr.press

Abstract Learning from Preferential Feedback (LfPF) plays an essential role in training Large
Language Models, as well as certain types of interactive learning agents. However, a …

被引用次数：1 相关文章所有 3 个版本

[PDF] uni-hannover.de

State and action abstraction for search and reinforcement learning algorithms

A Dockhorn, R Kruse - Artificial Intelligence in Control and Decision …, 2023 - Springer

Decision-making in large and dynamic environments has always been a challenge for AI
agents. Given the multitude of available sensors in robotics and the rising complexity of …

被引用次数：7 相关文章所有 4 个版本

[PDF] wordpress.com

[PDF][PDF] On Reward Binarisation and Bayesian Agents

E Catt, J Veness, M Hutter - European Workshop on …, 2022 - ewrl.wordpress.com

Reward binarisation is a common heuristically applied technique which can potentially
simplify a given reinforcement learning problem. However this procedure done without care …

被引用次数：1 相关文章

[引用][C] On the Foundations of Universal Artificial Intelligence

E Catt - 2022