Human-in-the-loop: Provably efficient preference-based reinforcement learning with general function approximation
We study human-in-the-loop reinforcement learning (RL) with trajectory preferences, where
instead of receiving a numeric reward at each step, the RL agent only receives preferences …
instead of receiving a numeric reward at each step, the RL agent only receives preferences …
A survey of reinforcement learning from human feedback
Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning
(RL) that learns from human feedback instead of relying on an engineered reward function …
(RL) that learns from human feedback instead of relying on an engineered reward function …
Dueling rl: Reinforcement learning with trajectory preferences
We consider the problem of preference-based reinforcement learning (PbRL), where, unlike
traditional reinforcement learning (RL), an agent receives feedback only in terms of 1 bit …
traditional reinforcement learning (RL), an agent receives feedback only in terms of 1 bit …
Convex reinforcement learning in finite trials
Convex Reinforcement Learning (RL) is a recently introduced framework that generalizes
the standard RL objective to any convex (or concave) function of the state distribution …
the standard RL objective to any convex (or concave) function of the state distribution …
Dueling rl: reinforcement learning with trajectory preferences
We consider the problem of preference based reinforcement learning (PbRL), where, unlike
traditional reinforcement learning, an agent receives feedback only in terms of a 1 bit (0/1) …
traditional reinforcement learning, an agent receives feedback only in terms of a 1 bit (0/1) …
Making rl with preference-based feedback efficient via randomization
Reinforcement Learning algorithms that learn from human feedback (RLHF) need to be
efficient in terms of statistical complexity, computational complexity, and query complexity. In …
efficient in terms of statistical complexity, computational complexity, and query complexity. In …
Ecological decision-making: From circuit elements to emerging principles
AM Hein - Current Opinion in Neurobiology, 2022 - Elsevier
The interactions an animal has with its prey, predators, neighbors, and competitors are
known as ecological interactions. Making effective decisions during ecological interactions …
known as ecological interactions. Making effective decisions during ecological interactions …
Submodular reinforcement learning
In reinforcement learning (RL), rewards of states are typically considered additive, and
following the Markov assumption, they are $\textit {independent} $ of states visited …
following the Markov assumption, they are $\textit {independent} $ of states visited …
Learning long-term reward redistribution via randomized return decomposition
Many practical applications of reinforcement learning require agents to learn from sparse
and delayed rewards. It challenges the ability of agents to attribute their actions to future …
and delayed rewards. It challenges the ability of agents to attribute their actions to future …
Challenging common assumptions in convex reinforcement learning
M Mutti, R De Santi… - Advances in Neural …, 2022 - proceedings.neurips.cc
Abstract The classic Reinforcement Learning (RL) formulation concerns the maximization of
a scalar reward function. More recently, convex RL has been introduced to extend the RL …
a scalar reward function. More recently, convex RL has been introduced to extend the RL …