Human-in-the-loop: Provably efficient preference-based reinforcement learning with general function approximation

X Chen, H Zhong, Z Yang, Z Wang… - … on Machine Learning, 2022 - proceedings.mlr.press
We study human-in-the-loop reinforcement learning (RL) with trajectory preferences, where
instead of receiving a numeric reward at each step, the RL agent only receives preferences …

A survey of reinforcement learning from human feedback

T Kaufmann, P Weng, V Bengs… - arXiv preprint arXiv …, 2023 - arxiv.org
Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning
(RL) that learns from human feedback instead of relying on an engineered reward function …

Dueling rl: Reinforcement learning with trajectory preferences

A Saha, A Pacchiano, J Lee - International Conference on …, 2023 - proceedings.mlr.press
We consider the problem of preference-based reinforcement learning (PbRL), where, unlike
traditional reinforcement learning (RL), an agent receives feedback only in terms of 1 bit …

Convex reinforcement learning in finite trials

M Mutti, R De Santi, P De Bartolomeis… - Journal of Machine …, 2023 - jmlr.org
Convex Reinforcement Learning (RL) is a recently introduced framework that generalizes
the standard RL objective to any convex (or concave) function of the state distribution …

Dueling rl: reinforcement learning with trajectory preferences

A Pacchiano, A Saha, J Lee - arXiv preprint arXiv:2111.04850, 2021 - arxiv.org
We consider the problem of preference based reinforcement learning (PbRL), where, unlike
traditional reinforcement learning, an agent receives feedback only in terms of a 1 bit (0/1) …

Making rl with preference-based feedback efficient via randomization

R Wu, W Sun - arXiv preprint arXiv:2310.14554, 2023 - arxiv.org
Reinforcement Learning algorithms that learn from human feedback (RLHF) need to be
efficient in terms of statistical complexity, computational complexity, and query complexity. In …

Ecological decision-making: From circuit elements to emerging principles

AM Hein - Current Opinion in Neurobiology, 2022 - Elsevier
The interactions an animal has with its prey, predators, neighbors, and competitors are
known as ecological interactions. Making effective decisions during ecological interactions …

Submodular reinforcement learning

M Prajapat, M Mutný, MN Zeilinger… - arXiv preprint arXiv …, 2023 - arxiv.org
In reinforcement learning (RL), rewards of states are typically considered additive, and
following the Markov assumption, they are $\textit {independent} $ of states visited …

Learning long-term reward redistribution via randomized return decomposition

Z Ren, R Guo, Y Zhou, J Peng - arXiv preprint arXiv:2111.13485, 2021 - arxiv.org
Many practical applications of reinforcement learning require agents to learn from sparse
and delayed rewards. It challenges the ability of agents to attribute their actions to future …

Challenging common assumptions in convex reinforcement learning

M Mutti, R De Santi… - Advances in Neural …, 2022 - proceedings.neurips.cc
Abstract The classic Reinforcement Learning (RL) formulation concerns the maximization of
a scalar reward function. More recently, convex RL has been introduced to extend the RL …