A bayesian approach for policy learning from trajectory preference queries

F Ingrand, M Ghallab - Artificial Intelligence, 2017 - Elsevier

Autonomous robots facing a diversity of open environments and performing a variety of tasks
and interactions need explicit deliberation in order to fulfill their missions. Deliberation is …

被引用次数：419 相关文章所有 9 个版本

[PDF] ai-plans.com

[PDF][PDF] Nash learning from human feedback

R Munos, M Valko, D Calandriello, MG Azar… - arXiv preprint arXiv …, 2023 - ai-plans.com

Large language models (LLMs)(Anil et al., 2023; Glaese et al., 2022; OpenAI, 2023; Ouyang
et al., 2022) have made remarkable strides in enhancing natural language understanding …

被引用次数：84 相关文章所有 5 个版本

[PDF] neurips.cc

Deep reinforcement learning from human preferences

PF Christiano, J Leike, T Brown… - Advances in neural …, 2017 - proceedings.neurips.cc

For sophisticated reinforcement learning (RL) systems to interact usefully with real-world
environments, we need to communicate complex goals to these systems. In this work, we …

被引用次数：3284 相关文章所有 14 个版本

[PDF] mlr.press

Few-shot preference learning for human-in-the-loop rl

DJ Hejna III, D Sadigh - Conference on Robot Learning, 2023 - proceedings.mlr.press

While reinforcement learning (RL) has become a more popular approach for robotics,
designing sufficiently informative reward functions for complex tasks has proven to be …

被引用次数：86 相关文章所有 6 个版本

[PDF] neurips.cc

Reward learning from human preferences and demonstrations in atari

B Ibarz, J Leike, T Pohlen, G Irving… - Advances in neural …, 2018 - proceedings.neurips.cc

To solve complex real-world problems with reinforcement learning, we cannot rely on
manually specified reward functions. Instead, we need humans to communicate an objective …

被引用次数：442 相关文章所有 7 个版本

[PDF] neurips.cc

Inverse preference learning: Preference-based rl without a reward function

J Hejna, D Sadigh - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Reward functions are difficult to design and often hard to align with human intent. Preference-
based Reinforcement Learning (RL) algorithms address these problems by learning reward …

被引用次数：44 相关文章所有 9 个版本

[PDF] springer.com

Interactive machine learning for health informatics: when do we need the human-in-the-loop?

A Holzinger - Brain informatics, 2016 - Springer

Abstract Machine learning (ML) is the fastest growing field in computer science, and health
informatics is among the greatest challenges. The goal of ML is to develop algorithms which …

被引用次数：1037 相关文章所有 17 个版本

[PDF] jmlr.org

A survey of preference-based reinforcement learning methods

C Wirth, R Akrour, G Neumann, J Fürnkranz - Journal of Machine Learning …, 2017 - jmlr.org

Reinforcement learning (RL) techniques optimize the accumulated long-term reward of a
suitably chosen reward function. However, designing such a reward function often requires …

被引用次数：430 相关文章所有 10 个版本

[PDF] arxiv.org

Contrastive prefence learning: Learning from human feedback without rl

J Hejna, R Rafailov, H Sikchi, C Finn, S Niekum… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular
paradigm for aligning models with human intent. Typically RLHF algorithms operate in two …

被引用次数：56 相关文章所有 5 个版本

[PDF] neurips.cc

On the expressivity of markov reward

D Abel, W Dabney, A Harutyunyan… - Advances in …, 2021 - proceedings.neurips.cc

Reward is the driving force for reinforcement-learning agents. This paper is dedicated to
understanding the expressivity of reward as a way to capture tasks that we would want an …

被引用次数：105 相关文章所有 12 个版本