[HTML][HTML] Deliberation for autonomous robots: A survey

F Ingrand, M Ghallab - Artificial Intelligence, 2017 - Elsevier
Autonomous robots facing a diversity of open environments and performing a variety of tasks
and interactions need explicit deliberation in order to fulfill their missions. Deliberation is …

[PDF][PDF] Nash learning from human feedback

R Munos, M Valko, D Calandriello, MG Azar… - arXiv preprint arXiv …, 2023 - ai-plans.com
Large language models (LLMs)(Anil et al., 2023; Glaese et al., 2022; OpenAI, 2023; Ouyang
et al., 2022) have made remarkable strides in enhancing natural language understanding …

Deep reinforcement learning from human preferences

PF Christiano, J Leike, T Brown… - Advances in neural …, 2017 - proceedings.neurips.cc
For sophisticated reinforcement learning (RL) systems to interact usefully with real-world
environments, we need to communicate complex goals to these systems. In this work, we …

Few-shot preference learning for human-in-the-loop rl

DJ Hejna III, D Sadigh - Conference on Robot Learning, 2023 - proceedings.mlr.press
While reinforcement learning (RL) has become a more popular approach for robotics,
designing sufficiently informative reward functions for complex tasks has proven to be …

Reward learning from human preferences and demonstrations in atari

B Ibarz, J Leike, T Pohlen, G Irving… - Advances in neural …, 2018 - proceedings.neurips.cc
To solve complex real-world problems with reinforcement learning, we cannot rely on
manually specified reward functions. Instead, we need humans to communicate an objective …

Inverse preference learning: Preference-based rl without a reward function

J Hejna, D Sadigh - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Reward functions are difficult to design and often hard to align with human intent. Preference-
based Reinforcement Learning (RL) algorithms address these problems by learning reward …

Interactive machine learning for health informatics: when do we need the human-in-the-loop?

A Holzinger - Brain informatics, 2016 - Springer
Abstract Machine learning (ML) is the fastest growing field in computer science, and health
informatics is among the greatest challenges. The goal of ML is to develop algorithms which …

A survey of preference-based reinforcement learning methods

C Wirth, R Akrour, G Neumann, J Fürnkranz - Journal of Machine Learning …, 2017 - jmlr.org
Reinforcement learning (RL) techniques optimize the accumulated long-term reward of a
suitably chosen reward function. However, designing such a reward function often requires …

Contrastive prefence learning: Learning from human feedback without rl

J Hejna, R Rafailov, H Sikchi, C Finn, S Niekum… - arXiv preprint arXiv …, 2023 - arxiv.org
Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular
paradigm for aligning models with human intent. Typically RLHF algorithms operate in two …

On the expressivity of markov reward

D Abel, W Dabney, A Harutyunyan… - Advances in …, 2021 - proceedings.neurips.cc
Reward is the driving force for reinforcement-learning agents. This paper is dedicated to
understanding the expressivity of reward as a way to capture tasks that we would want an …