Learning multimodal rewards from rankings

A Kalinowska, PM Pilarski… - Annual review of control …, 2023 - annualreviews.org

Early research on physical human–robot interaction (pHRI) has necessarily focused on
device design—the creation of compliant and sensorized hardware, such as exoskeletons …

被引用次数：27 相关文章所有 4 个版本

[PDF] arxiv.org

Open problems and fundamental limitations of reinforcement learning from human feedback

S Casper, X Davies, C Shi, TK Gilbert… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …

被引用次数：345 相关文章所有 6 个版本

[PDF] neurips.cc

Roboclip: One demonstration is enough to learn robot policies

S Sontakke, J Zhang, S Arnold… - Advances in …, 2024 - proceedings.neurips.cc

Reward specification is a notoriously difficult problem in reinforcement learning, requiring
extensive expert supervision to design robust reward functions. Imitation learning (IL) …

被引用次数：36 相关文章所有 7 个版本

[PDF] mlr.press

Few-shot preference learning for human-in-the-loop rl

DJ Hejna III, D Sadigh - Conference on Robot Learning, 2023 - proceedings.mlr.press

While reinforcement learning (RL) has become a more popular approach for robotics,
designing sufficiently informative reward functions for complex tasks has proven to be …

被引用次数：79 相关文章所有 6 个版本

[PDF] neurips.cc

Inverse preference learning: Preference-based rl without a reward function

J Hejna, D Sadigh - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Reward functions are difficult to design and often hard to align with human intent. Preference-
based Reinforcement Learning (RL) algorithms address these problems by learning reward …

被引用次数：32 相关文章所有 9 个版本

[PDF] nowpublishers.com

Interactive imitation learning in robotics: A survey

C Celemin, R Pérez-Dattari, E Chisari… - … and Trends® in …, 2022 - nowpublishers.com

Interactive Imitation Learning in Robotics: A Survey Page 1 Interactive Imitation Learning in
Robotics: A Survey Page 2 Other titles in Foundations and Trends® in Robotics A Survey on …

被引用次数：47 相关文章所有 8 个版本

[PDF] arxiv.org

A survey of reinforcement learning from human feedback

T Kaufmann, P Weng, V Bengs… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning
(RL) that learns from human feedback instead of relying on an engineered reward function …

被引用次数：75 相关文章所有 4 个版本

[PDF] thecvf.com

Promptable behaviors: Personalizing multi-objective rewards from human preferences

M Hwang, L Weihs, C Park, K Lee… - Proceedings of the …, 2024 - openaccess.thecvf.com

Customizing robotic behaviors to be aligned with diverse human preferences is an
underexplored challenge in the field of embodied AI. In this paper we present Promptable …

被引用次数：7 相关文章所有 4 个版本

[PDF] ieee.org

Guided reinforcement learning: A review and evaluation for efficient and effective real-world robotics [survey]

J Eßer, N Bach, C Jestel, O Urbann… - IEEE Robotics & …, 2022 - ieeexplore.ieee.org

Recent successes aside, reinforcement learning (RL) still faces significant challenges in its
application to the real-world robotics domain. Guiding the learning process with additional …

被引用次数：14 相关文章

[PDF] researchgate.net

Active preference-based Gaussian process regression for reward learning and optimization

E Bıyık, N Huynh, MJ Kochenderfer… - … Journal of Robotics …, 2024 - journals.sagepub.com

Designing reward functions is a difficult task in AI and robotics. The complex task of directly
specifying all the desirable behaviors a robot needs to optimize often proves challenging for …

被引用次数：10 相关文章所有 6 个版本