Legilimens: Practical and Unified Content Moderation for Large Language Model Services

J Wu, J Deng, S Pang, Y Chen, J Xu, X Li… - Proceedings of the 2024 …, 2024 - dl.acm.org
Given the societal impact of unsafe content generated by large language models (LLMs),
ensuring that LLM services comply with safety standards is a crucial concern for LLM service …

Seeing like an ai: How llms apply (and misapply) wikipedia neutrality norms

J Ashkinaze, R Guan, L Kurek, E Adar, C Budak… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) are trained on broad corpora and then used in communities
with specialized norms. Is providing LLMs with community rules enough for models to follow …

AI Rules? Characterizing Reddit Community Policies Towards AI-Generated Content

T Lloyd, J Gosciak, T Nguyen, M Naaman - arXiv preprint arXiv …, 2024 - arxiv.org
How are Reddit communities responding to AI-generated content? We explored this
question through a large-scale analysis of subreddit community rules and their change over …

" They are uncultured": Unveiling Covert Harms and Social Threats in LLM Generated Conversations

PPS Dammu, H Jung, A Singh, M Choudhury… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have emerged as an integral part of modern societies,
powering user-facing applications such as personal assistants and enterprise applications …