On the theory of reinforcement learning with once-per-episode feedback

Human-in-the-loop: Provably efficient preference-based reinforcement learning with general function approximation

X Chen, H Zhong, Z Yang, Z Wang… - … on Machine Learning, 2022 - proceedings.mlr.press

We study human-in-the-loop reinforcement learning (RL) with trajectory preferences, where
instead of receiving a numeric reward at each step, the RL agent only receives preferences …

被引用次数：50 相关文章所有 5 个版本

[PDF] arxiv.org

A survey of reinforcement learning from human feedback

T Kaufmann, P Weng, V Bengs… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning
(RL) that learns from human feedback instead of relying on an engineered reward function …

被引用次数：56 相关文章所有 4 个版本

[PDF] mlr.press

Dueling rl: Reinforcement learning with trajectory preferences

A Saha, A Pacchiano, J Lee - International Conference on …, 2023 - proceedings.mlr.press

We consider the problem of preference-based reinforcement learning (PbRL), where, unlike
traditional reinforcement learning (RL), an agent receives feedback only in terms of 1 bit …

被引用次数：26 相关文章

[PDF] jmlr.org

Convex reinforcement learning in finite trials

M Mutti, R De Santi, P De Bartolomeis… - Journal of Machine …, 2023 - jmlr.org

Convex Reinforcement Learning (RL) is a recently introduced framework that generalizes
the standard RL objective to any convex (or concave) function of the state distribution …

被引用次数：10 相关文章所有 5 个版本

[PDF] arxiv.org

Dueling rl: reinforcement learning with trajectory preferences

A Pacchiano, A Saha, J Lee - arXiv preprint arXiv:2111.04850, 2021 - arxiv.org

We consider the problem of preference based reinforcement learning (PbRL), where, unlike
traditional reinforcement learning, an agent receives feedback only in terms of a 1 bit (0/1) …

被引用次数：47 相关文章所有 2 个版本

[PDF] arxiv.org

Making rl with preference-based feedback efficient via randomization

R Wu, W Sun - arXiv preprint arXiv:2310.14554, 2023 - arxiv.org

Reinforcement Learning algorithms that learn from human feedback (RLHF) need to be
efficient in terms of statistical complexity, computational complexity, and query complexity. In …

被引用次数：15 相关文章所有 3 个版本

[PDF] sciencedirect.com

Ecological decision-making: From circuit elements to emerging principles

AM Hein - Current Opinion in Neurobiology, 2022 - Elsevier

The interactions an animal has with its prey, predators, neighbors, and competitors are
known as ecological interactions. Making effective decisions during ecological interactions …

被引用次数：8 相关文章所有 4 个版本

[PDF] arxiv.org

Submodular reinforcement learning

M Prajapat, M Mutný, MN Zeilinger… - arXiv preprint arXiv …, 2023 - arxiv.org

In reinforcement learning (RL), rewards of states are typically considered additive, and
following the Markov assumption, they are $\textit {independent} $ of states visited …

被引用次数：8 相关文章所有 5 个版本

[PDF] arxiv.org

Learning long-term reward redistribution via randomized return decomposition

Z Ren, R Guo, Y Zhou, J Peng - arXiv preprint arXiv:2111.13485, 2021 - arxiv.org

Many practical applications of reinforcement learning require agents to learn from sparse
and delayed rewards. It challenges the ability of agents to attribute their actions to future …

被引用次数：29 相关文章所有 7 个版本

[PDF] neurips.cc

Challenging common assumptions in convex reinforcement learning

M Mutti, R De Santi… - Advances in Neural …, 2022 - proceedings.neurips.cc

Abstract The classic Reinforcement Learning (RL) formulation concerns the maximization of
a scalar reward function. More recently, convex RL has been introduced to extend the RL …

被引用次数：17 相关文章所有 9 个版本