Policy shaping: Integrating human feedback with reinforcement learning

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

被引用次数：133 相关文章所有 3 个版本

[PDF] arxiv.org

Reinforcement learning in healthcare: A survey

C Yu, J Liu, S Nemati, G Yin - ACM Computing Surveys (CSUR), 2021 - dl.acm.org

As a subfield of machine learning, reinforcement learning (RL) aims at optimizing decision
making by using interaction samples of an agent with its environment and the potentially …

被引用次数：662 相关文章所有 5 个版本

[PDF] arxiv.org

Open problems and fundamental limitations of reinforcement learning from human feedback

S Casper, X Davies, C Shi, TK Gilbert… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …

被引用次数：292 相关文章所有 6 个版本

[PDF] mlr.press

Do the rewards justify the means? measuring trade-offs between rewards and ethical behavior in the machiavelli benchmark

A Pan, JS Chan, A Zou, N Li, S Basart… - International …, 2023 - proceedings.mlr.press

Artificial agents have traditionally been trained to maximize reward, which may incentivize
power-seeking and deception, analogous to how next-token prediction in language models …

被引用次数：100 相关文章所有 6 个版本

[PDF] arxiv.org

The curse of recursion: Training on generated data makes models forget

I Shumailov, Z Shumaylov, Y Zhao, Y Gal… - arXiv preprint arXiv …, 2023 - arxiv.org

Stable Diffusion revolutionised image creation from descriptive text. GPT-2, GPT-3 (. 5) and
GPT-4 demonstrated astonishing performance across a variety of language tasks. ChatGPT …

被引用次数：213 相关文章所有 4 个版本

[PDF] jmlr.org

Curriculum learning for reinforcement learning domains: A framework and survey

S Narvekar, B Peng, M Leonetti, J Sinapov… - Journal of Machine …, 2020 - jmlr.org

Reinforcement learning (RL) is a popular paradigm for addressing sequential decision tasks
in which the agent has only limited environmental feedback. Despite many advances over …

被引用次数：517 相关文章所有 11 个版本

[PDF] nsf.gov

In situ bidirectional human-robot value alignment

L Yuan, X Gao, Z Zheng, M Edmonds, YN Wu… - Science robotics, 2022 - science.org

A prerequisite for social coordination is bidirectional communication between teammates,
each playing two roles simultaneously: as receptive listeners and expressive speakers. For …

被引用次数：81 相关文章所有 6 个版本

[PDF] springer.com

Generative artificial intelligence

L Banh, G Strobel - Electronic Markets, 2023 - Springer

Recent developments in the field of artificial intelligence (AI) have enabled new paradigms
of machine processing, shifting from data-driven, discriminative AI tasks toward …

被引用次数：93 相关文章所有 5 个版本

[PDF] arxiv.org

Open sesame! universal black box jailbreaking of large language models

R Lapid, R Langberg, M Sipper - arXiv preprint arXiv:2309.01446, 2023 - arxiv.org

Large language models (LLMs), designed to provide helpful and safe responses, often rely
on alignment techniques to align with user intent and social guidelines. Unfortunately, this …

被引用次数：93 相关文章所有 5 个版本

[PDF] jair.org

A survey on transfer learning for multiagent reinforcement learning systems

FL Da Silva, AHR Costa - Journal of Artificial Intelligence Research, 2019 - jair.org

Multiagent Reinforcement Learning (RL) solves complex tasks that require coordination with
other agents through autonomous exploration of the environment. However, learning a …

被引用次数：305 相关文章所有 10 个版本