STELA: a community-centred approach to norm elicitation for AI alignment

I Gabriel, A Manzini, G Keeling, LA Hendricks… - arXiv preprint arXiv …, 2024 - arxiv.org

This paper focuses on the opportunities and the ethical and societal risks posed by
advanced AI assistants. We define advanced AI assistants as artificial agents with natural …

被引用次数：36 相关文章所有 2 个版本

[PDF] arxiv.org

The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of …

HR Kirk, A Whitefield, P Röttger, A Bean… - arXiv preprint arXiv …, 2024 - arxiv.org

Human feedback plays a central role in the alignment of Large Language Models (LLMs).
However, open questions remain about the methods (how), domains (where), people (who) …

被引用次数：40 相关文章所有 2 个版本

[PDF] springer.com

AI ethics as a complex and multifaceted challenge: decoding educators' AI ethics alignment through the lens of activity theory

J Kamali, MF Alpat, A Bozkurt - International Journal of Educational …, 2024 - Springer

This study explores university educators' perspectives on their alignment with artificial
intelligence (AI) ethics, considering activity theory (AT), which forms the theoretical …

Beyond static AI evaluations: advancing human interaction evaluations for LLM harms and risks

L Ibrahim, S Huang, L Ahmad, M Anderljung - arXiv preprint arXiv …, 2024 - arxiv.org

Model evaluations are central to understanding the safety, risks, and societal impacts of AI
systems. While most real-world AI applications involve human-AI interaction, most current …

被引用次数：10 相关文章所有 2 个版本

[PDF] acm.org

Participation in the age of foundation models

H Suresh, E Tseng, M Young, M Gray… - The 2024 ACM …, 2024 - dl.acm.org

Growing interest and investment in the capabilities of foundation models has positioned
such systems to impact a wide array of services, from banking to healthcare. Alongside …

被引用次数：10 相关文章所有 5 个版本

[PDF] arxiv.org

ValueScope: Unveiling Implicit Norms and Values via Return Potential Model of Social Interactions

CY Park, SS Li, H Jung, S Volkova, T Mitra… - arXiv preprint arXiv …, 2024 - arxiv.org

This study introduces ValueScope, a framework leveraging language models to quantify
social norms and values within online communities, grounded in social science perspectives …

被引用次数：2 相关文章所有 4 个版本

被引用次数：7 相关文章