- 学术资源搜索

Ai alignment: A comprehensive survey

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

被引用次数：194 相关文章所有 3 个版本

[PDF] arxiv.org

A survey of imitation learning: Algorithms, recent developments, and challenges

M Zare, PM Kebria, A Khosravi… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

In recent years, the development of robotics and artificial intelligence (AI) systems has been
nothing short of remarkable. As these systems continue to evolve, they are being utilized in …

被引用次数：57 相关文章所有 2 个版本

[PDF] arxiv.org

Open problems and fundamental limitations of reinforcement learning from human feedback

S Casper, X Davies, C Shi, TK Gilbert… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …

被引用次数：397 相关文章所有 6 个版本

[PDF] arxiv.org

Getting aligned on representational alignment

I Sucholutsky, L Muttenthaler, A Weller, A Peng… - arXiv preprint arXiv …, 2023 - arxiv.org

Biological and artificial information processing systems form representations that they can
use to categorize, reason, plan, navigate, and make decisions. How can we measure the …

被引用次数：61 相关文章所有 2 个版本

[PDF] mlr.press

Few-shot preference learning for human-in-the-loop rl

DJ Hejna III, D Sadigh - Conference on Robot Learning, 2023 - proceedings.mlr.press

While reinforcement learning (RL) has become a more popular approach for robotics,
designing sufficiently informative reward functions for complex tasks has proven to be …

被引用次数：84 相关文章所有 6 个版本

[PDF] neurips.cc

Inverse preference learning: Preference-based rl without a reward function

J Hejna, D Sadigh - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Reward functions are difficult to design and often hard to align with human intent. Preference-
based Reinforcement Learning (RL) algorithms address these problems by learning reward …

被引用次数：39 相关文章所有 9 个版本

[PDF] arxiv.org

The effects of reward misspecification: Mapping and mitigating misaligned models

A Pan, K Bhatia, J Steinhardt - arXiv preprint arXiv:2201.03544, 2022 - arxiv.org

Reward hacking--where RL agents exploit gaps in misspecified reward functions--has been
widely observed, but not yet systematically studied. To understand how reward hacking …

被引用次数：148 相关文章所有 5 个版本

[PDF] jmlr.org

A review of robot learning for manipulation: Challenges, representations, and algorithms

O Kroemer, S Niekum, G Konidaris - Journal of machine learning research, 2021 - jmlr.org

A key challenge in intelligent robotics is creating robots that are capable of directly
interacting with the world around them to achieve their goals. The last decade has seen …

被引用次数：426 相关文章所有 18 个版本

[PDF] arxiv.org

Contrastive prefence learning: Learning from human feedback without rl

J Hejna, R Rafailov, H Sikchi, C Finn, S Niekum… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular
paradigm for aligning models with human intent. Typically RLHF algorithms operate in two …

被引用次数：50 相关文章所有 5 个版本

[PDF] nowpublishers.com

Interactive imitation learning in robotics: A survey

C Celemin, R Pérez-Dattari, E Chisari… - … and Trends® in …, 2022 - nowpublishers.com

Interactive Imitation Learning in Robotics: A Survey Page 1 Interactive Imitation Learning in
Robotics: A Survey Page 2 Other titles in Foundations and Trends® in Robotics A Survey on …

被引用次数：49 相关文章所有 8 个版本