Ai alignment: A comprehensive survey

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

Building machines that learn and think with people

KM Collins, I Sucholutsky, U Bhatt, K Chandra… - Nature Human …, 2024 - nature.com
What do we want from machine intelligence? We envision machines that are not just tools
for thought but partners in thought: reasonable, insightful, knowledgeable, reliable and …

Machine theory of mind

N Rabinowitz, F Perbet, F Song… - International …, 2018 - proceedings.mlr.press
Abstract Theory of mind (ToM) broadly refers to humans' ability to represent the mental
states of others, including their desires, beliefs, and intentions. We design a Theory of Mind …

Modeling others using oneself in multi-agent reinforcement learning

R Raileanu, E Denton, A Szlam… - … conference on machine …, 2018 - proceedings.mlr.press
We consider the multi-agent reinforcement learning setting with imperfect information. The
reward function depends on the hidden goals of both agents, so the agents must infer the …

Safe imitation learning via fast bayesian reward inference from preferences

D Brown, R Coleman, R Srinivasan… - … on Machine Learning, 2020 - proceedings.mlr.press
Bayesian reward learning from demonstrations enables rigorous safety and uncertainty
analysis when performing imitation learning. However, Bayesian reward learning methods …

Verification for machine learning, autonomy, and neural networks survey

W Xiang, P Musau, AA Wild, DM Lopez… - arXiv preprint arXiv …, 2018 - arxiv.org
This survey presents an overview of verification techniques for autonomous systems, with a
focus on safety-critical autonomous cyber-physical systems (CPS) and subcomponents …

Human-in-the-loop imitation learning using remote teleoperation

A Mandlekar, D Xu, R Martín-Martín, Y Zhu… - arXiv preprint arXiv …, 2020 - arxiv.org
Imitation Learning is a promising paradigm for learning complex robot manipulation skills by
reproducing behavior from human demonstrations. However, manipulation tasks often …

Validating metrics for reward alignment in human-autonomy teaming

L Sanneman, JA Shah - Computers in Human Behavior, 2023 - Elsevier
Alignment of human and autonomous agent values and objectives is vital in human-
autonomy teaming settings which require collaborative action toward a common goal. In …

Cognitive science as a source of forward and inverse models of human decisions for robotics and control

MK Ho, TL Griffiths - Annual Review of Control, Robotics, and …, 2022 - annualreviews.org
Those designing autonomous systems that interact with humans will invariably face
questions about how humans think and make decisions. Fortunately, computational …

Reconciling truthfulness and relevance as epistemic and decision-theoretic utility.

TR Sumers, MK Ho, TL Griffiths… - Psychological Review, 2024 - psycnet.apa.org
People use language to influence others' beliefs and actions. Yet models of communication
have diverged along these lines, formalizing the speaker's objective in terms of either the …