Ai alignment: A comprehensive survey
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …
Building machines that learn and think with people
What do we want from machine intelligence? We envision machines that are not just tools
for thought but partners in thought: reasonable, insightful, knowledgeable, reliable and …
for thought but partners in thought: reasonable, insightful, knowledgeable, reliable and …
Machine theory of mind
Abstract Theory of mind (ToM) broadly refers to humans' ability to represent the mental
states of others, including their desires, beliefs, and intentions. We design a Theory of Mind …
states of others, including their desires, beliefs, and intentions. We design a Theory of Mind …
Modeling others using oneself in multi-agent reinforcement learning
We consider the multi-agent reinforcement learning setting with imperfect information. The
reward function depends on the hidden goals of both agents, so the agents must infer the …
reward function depends on the hidden goals of both agents, so the agents must infer the …
Safe imitation learning via fast bayesian reward inference from preferences
Bayesian reward learning from demonstrations enables rigorous safety and uncertainty
analysis when performing imitation learning. However, Bayesian reward learning methods …
analysis when performing imitation learning. However, Bayesian reward learning methods …
Verification for machine learning, autonomy, and neural networks survey
This survey presents an overview of verification techniques for autonomous systems, with a
focus on safety-critical autonomous cyber-physical systems (CPS) and subcomponents …
focus on safety-critical autonomous cyber-physical systems (CPS) and subcomponents …
Human-in-the-loop imitation learning using remote teleoperation
Imitation Learning is a promising paradigm for learning complex robot manipulation skills by
reproducing behavior from human demonstrations. However, manipulation tasks often …
reproducing behavior from human demonstrations. However, manipulation tasks often …
Validating metrics for reward alignment in human-autonomy teaming
L Sanneman, JA Shah - Computers in Human Behavior, 2023 - Elsevier
Alignment of human and autonomous agent values and objectives is vital in human-
autonomy teaming settings which require collaborative action toward a common goal. In …
autonomy teaming settings which require collaborative action toward a common goal. In …
Cognitive science as a source of forward and inverse models of human decisions for robotics and control
MK Ho, TL Griffiths - Annual Review of Control, Robotics, and …, 2022 - annualreviews.org
Those designing autonomous systems that interact with humans will invariably face
questions about how humans think and make decisions. Fortunately, computational …
questions about how humans think and make decisions. Fortunately, computational …
Reconciling truthfulness and relevance as epistemic and decision-theoretic utility.
People use language to influence others' beliefs and actions. Yet models of communication
have diverged along these lines, formalizing the speaker's objective in terms of either the …
have diverged along these lines, formalizing the speaker's objective in terms of either the …