Ai alignment: A comprehensive survey
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …
Reinforcement learning in healthcare: A survey
As a subfield of machine learning, reinforcement learning (RL) aims at optimizing decision
making by using interaction samples of an agent with its environment and the potentially …
making by using interaction samples of an agent with its environment and the potentially …
Open problems and fundamental limitations of reinforcement learning from human feedback
Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …
to align with human goals. RLHF has emerged as the central method used to finetune state …
Do the rewards justify the means? measuring trade-offs between rewards and ethical behavior in the machiavelli benchmark
Artificial agents have traditionally been trained to maximize reward, which may incentivize
power-seeking and deception, analogous to how next-token prediction in language models …
power-seeking and deception, analogous to how next-token prediction in language models …
The curse of recursion: Training on generated data makes models forget
Stable Diffusion revolutionised image creation from descriptive text. GPT-2, GPT-3 (. 5) and
GPT-4 demonstrated astonishing performance across a variety of language tasks. ChatGPT …
GPT-4 demonstrated astonishing performance across a variety of language tasks. ChatGPT …
Curriculum learning for reinforcement learning domains: A framework and survey
Reinforcement learning (RL) is a popular paradigm for addressing sequential decision tasks
in which the agent has only limited environmental feedback. Despite many advances over …
in which the agent has only limited environmental feedback. Despite many advances over …
In situ bidirectional human-robot value alignment
A prerequisite for social coordination is bidirectional communication between teammates,
each playing two roles simultaneously: as receptive listeners and expressive speakers. For …
each playing two roles simultaneously: as receptive listeners and expressive speakers. For …
Open sesame! universal black box jailbreaking of large language models
Large language models (LLMs), designed to provide helpful and safe responses, often rely
on alignment techniques to align with user intent and social guidelines. Unfortunately, this …
on alignment techniques to align with user intent and social guidelines. Unfortunately, this …
A survey on transfer learning for multiagent reinforcement learning systems
FL Da Silva, AHR Costa - Journal of Artificial Intelligence Research, 2019 - jair.org
Multiagent Reinforcement Learning (RL) solves complex tasks that require coordination with
other agents through autonomous exploration of the environment. However, learning a …
other agents through autonomous exploration of the environment. However, learning a …