Ai alignment: A comprehensive survey

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

Reinforcement learning in healthcare: A survey

C Yu, J Liu, S Nemati, G Yin - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
As a subfield of machine learning, reinforcement learning (RL) aims at optimizing decision
making by using interaction samples of an agent with its environment and the potentially …

Open problems and fundamental limitations of reinforcement learning from human feedback

S Casper, X Davies, C Shi, TK Gilbert… - arXiv preprint arXiv …, 2023 - arxiv.org
Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …

Do the rewards justify the means? measuring trade-offs between rewards and ethical behavior in the machiavelli benchmark

A Pan, JS Chan, A Zou, N Li, S Basart… - International …, 2023 - proceedings.mlr.press
Artificial agents have traditionally been trained to maximize reward, which may incentivize
power-seeking and deception, analogous to how next-token prediction in language models …

The curse of recursion: Training on generated data makes models forget

I Shumailov, Z Shumaylov, Y Zhao, Y Gal… - arXiv preprint arXiv …, 2023 - arxiv.org
Stable Diffusion revolutionised image creation from descriptive text. GPT-2, GPT-3 (. 5) and
GPT-4 demonstrated astonishing performance across a variety of language tasks. ChatGPT …

Curriculum learning for reinforcement learning domains: A framework and survey

S Narvekar, B Peng, M Leonetti, J Sinapov… - Journal of Machine …, 2020 - jmlr.org
Reinforcement learning (RL) is a popular paradigm for addressing sequential decision tasks
in which the agent has only limited environmental feedback. Despite many advances over …

In situ bidirectional human-robot value alignment

L Yuan, X Gao, Z Zheng, M Edmonds, YN Wu… - Science robotics, 2022 - science.org
A prerequisite for social coordination is bidirectional communication between teammates,
each playing two roles simultaneously: as receptive listeners and expressive speakers. For …

Generative artificial intelligence

L Banh, G Strobel - Electronic Markets, 2023 - Springer
Recent developments in the field of artificial intelligence (AI) have enabled new paradigms
of machine processing, shifting from data-driven, discriminative AI tasks toward …

Open sesame! universal black box jailbreaking of large language models

R Lapid, R Langberg, M Sipper - arXiv preprint arXiv:2309.01446, 2023 - arxiv.org
Large language models (LLMs), designed to provide helpful and safe responses, often rely
on alignment techniques to align with user intent and social guidelines. Unfortunately, this …

A survey on transfer learning for multiagent reinforcement learning systems

FL Da Silva, AHR Costa - Journal of Artificial Intelligence Research, 2019 - jair.org
Multiagent Reinforcement Learning (RL) solves complex tasks that require coordination with
other agents through autonomous exploration of the environment. However, learning a …