Pragmatic-pedagogic value alignment

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

被引用次数：198 相关文章所有 3 个版本

[PDF] arxiv.org

Building machines that learn and think with people

KM Collins, I Sucholutsky, U Bhatt, K Chandra… - Nature Human …, 2024 - nature.com

What do we want from machine intelligence? We envision machines that are not just tools
for thought but partners in thought: reasonable, insightful, knowledgeable, reliable and …

被引用次数：9 相关文章所有 9 个版本

[PDF] mlr.press

Machine theory of mind

N Rabinowitz, F Perbet, F Song… - International …, 2018 - proceedings.mlr.press

Abstract Theory of mind (ToM) broadly refers to humans' ability to represent the mental
states of others, including their desires, beliefs, and intentions. We design a Theory of Mind …

被引用次数：661 相关文章所有 8 个版本

[PDF] mlr.press

Modeling others using oneself in multi-agent reinforcement learning

R Raileanu, E Denton, A Szlam… - … conference on machine …, 2018 - proceedings.mlr.press

We consider the multi-agent reinforcement learning setting with imperfect information. The
reward function depends on the hidden goals of both agents, so the agents must infer the …

被引用次数：251 相关文章所有 6 个版本

[PDF] mlr.press

Safe imitation learning via fast bayesian reward inference from preferences

D Brown, R Coleman, R Srinivasan… - … on Machine Learning, 2020 - proceedings.mlr.press

Bayesian reward learning from demonstrations enables rigorous safety and uncertainty
analysis when performing imitation learning. However, Bayesian reward learning methods …

被引用次数：129 相关文章所有 10 个版本

[PDF] arxiv.org

Verification for machine learning, autonomy, and neural networks survey

W Xiang, P Musau, AA Wild, DM Lopez… - arXiv preprint arXiv …, 2018 - arxiv.org

This survey presents an overview of verification techniques for autonomous systems, with a
focus on safety-critical autonomous cyber-physical systems (CPS) and subcomponents …

被引用次数：121 相关文章所有 5 个版本

[PDF] arxiv.org

Human-in-the-loop imitation learning using remote teleoperation

A Mandlekar, D Xu, R Martín-Martín, Y Zhu… - arXiv preprint arXiv …, 2020 - arxiv.org

Imitation Learning is a promising paradigm for learning complex robot manipulation skills by
reproducing behavior from human demonstrations. However, manipulation tasks often …

被引用次数：76 相关文章所有 3 个版本

[PDF] sciencedirect.com

Validating metrics for reward alignment in human-autonomy teaming

L Sanneman, JA Shah - Computers in Human Behavior, 2023 - Elsevier

Alignment of human and autonomous agent values and objectives is vital in human-
autonomy teaming settings which require collaborative action toward a common goal. In …

被引用次数：13 相关文章所有 2 个版本

[PDF] arxiv.org

Cognitive science as a source of forward and inverse models of human decisions for robotics and control

MK Ho, TL Griffiths - Annual Review of Control, Robotics, and …, 2022 - annualreviews.org

Those designing autonomous systems that interact with humans will invariably face
questions about how humans think and make decisions. Fortunately, computational …

被引用次数：44 相关文章所有 9 个版本

[PDF] osf.io

Reconciling truthfulness and relevance as epistemic and decision-theoretic utility.

TR Sumers, MK Ho, TL Griffiths… - Psychological Review, 2024 - psycnet.apa.org

People use language to influence others' beliefs and actions. Yet models of communication
have diverged along these lines, formalizing the speaker's objective in terms of either the …

被引用次数：17 相关文章所有 13 个版本