A definition of continual reinforcement learning
In a standard view of the reinforcement learning problem, an agent's goal is to efficiently
identify a policy that maximizes long-term reward. However, this perspective is based on a …
identify a policy that maximizes long-term reward. However, this perspective is based on a …
Settling the reward hypothesis
The reward hypothesis posits that," all of what we mean by goals and purposes can be well
thought of as maximization of the expected value of the cumulative sum of a received scalar …
thought of as maximization of the expected value of the cumulative sum of a received scalar …
Self-predictive universal AI
Reinforcement Learning (RL) algorithms typically utilize learning and/or planning
techniques to derive effective policies. The integration of both approaches has proven to be …
techniques to derive effective policies. The integration of both approaches has proven to be …
Conditions on Preference Relations that Guarantee the Existence of Optimal Policies
JC Carr, P Panangaden… - … Conference on Artificial …, 2024 - proceedings.mlr.press
Abstract Learning from Preferential Feedback (LfPF) plays an essential role in training Large
Language Models, as well as certain types of interactive learning agents. However, a …
Language Models, as well as certain types of interactive learning agents. However, a …
State and action abstraction for search and reinforcement learning algorithms
A Dockhorn, R Kruse - Artificial Intelligence in Control and Decision …, 2023 - Springer
Decision-making in large and dynamic environments has always been a challenge for AI
agents. Given the multitude of available sensors in robotics and the rising complexity of …
agents. Given the multitude of available sensors in robotics and the rising complexity of …
[PDF][PDF] On Reward Binarisation and Bayesian Agents
Reward binarisation is a common heuristically applied technique which can potentially
simplify a given reinforcement learning problem. However this procedure done without care …
simplify a given reinforcement learning problem. However this procedure done without care …