Paul Röttger 个人学术档案

引用次数

	总计	2019 年至今
引用	811	810
h 指数	11	11
i10 指数	14	14

340

170

255

202120222023202418 116 333 339

开放获取的出版物数量

查看全部

1 篇文章

0 篇文章

可查看的文章

无法查看的文章

根据资助方的强制性开放获取政策

合著作者

Bertie VidgenOxford, Turing在 rewire.online 的电子邮件经过验证
Hannah Rose KirkUniversity of Oxford在 oii.ox.ac.uk 的电子邮件经过验证
Dirk HovyBocconi University在 unibocconi.it 的电子邮件经过验证
Janet B. PierrehumbertProf. of Language Modelling, Univ. of Oxford Dept. of Engineering Science在 oerc.ox.ac.uk 的电子邮件经过验证
Helen MargettsProfessor of Society and the Internet, University of Oxford在 oii.ox.ac.uk 的电子邮件经过验证
Giuseppe AttanasioPostdoctoral Researcher, Instituto de Telecomunicações在 lx.it.pt 的电子邮件经过验证
Debora NozzaAssistant Professor, Bocconi University在 unibocconi.it 的电子邮件经过验证

关注

Paul Röttger

Postdoctoral Researcher, Bocconi University

在 unibocconi.it 的电子邮件经过验证 - 首页

Natural Language Processing Large Language Models Online Harms AI Safety


标题按引用次数排序按年份排序按标题排序	引用次数引用次数	年份
HateCheck: Functional Tests for Hate Speech Detection Models P Röttger, B Vidgen, D Nguyen, Z Waseem, H Margetts, J Pierrehumbert ACL 2021 (Main) - 🏆 Stanford HAI AI Audit Challenge, 2021	207	2021
Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks P Röttger, B Vidgen, D Hovy, JB Pierrehumbert NAACL 2022 (Main), 2022	109	2022
SemEval-2023 Task 10: Explainable Detection of Online Sexism HR Kirk, W Yin, B Vidgen, P Röttger ACL 2023 (Main) - 🏆 Best Task Paper, 2023	85	2023
The benefits, risks and bounds of personalizing the alignment of large language models to individuals HR Kirk, B Vidgen, P Röttger, SA Hale Nature Machine Intelligence, 2024	79*	2024
Temporal Adaptation of BERT and Performance on Downstream Document Classification: Insights from Social Media P Röttger, JB Pierrehumbert EMNLP 2021 (Findings), 2021	56	2021
Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate HR Kirk, B Vidgen, P Röttger, T Thrush, SA Hale NAACL 2022 (Main), 2021	51	2021
Safety-tuned llamas: Lessons from improving the safety of large language models that follow instructions F Bianchi, M Suzgun, G Attanasio, P Röttger, D Jurafsky, T Hashimoto, ... ICLR 2024 (Poster), 2023	45	2023
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models P Röttger, HR Kirk, B Vidgen, G Attanasio, F Bianchi, D Hovy NAACL 2024 (Main), 2023	42	2023
Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models P Röttger, H Seelawi, D Nozza, Z Talat, B Vidgen WOAH at NAACL 2022, 2022	40	2022
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values HR Kirk, AM Bean, B Vidgen, P Röttger, SA Hale EMNLP 2023 (Main), 2023	17	2023
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models P Röttger, V Hofmann, V Pyatkin, M Hinck, HR Kirk, H Schütze, D Hovy ACL 2024 (Main), 2024	11	2024
"My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models X Wang, B Ma, C Hu, L Weber-Genzel, P Röttger, F Kreuter, D Hovy, ... ACL 2024 (Findings), 2024	11	2024
The Ecological Fallacy in Annotation: Modelling Human Label Variation goes beyond Sociodemographics M Orlikowski, P Röttger, P Cimiano, D Hovy ACL 2023 (Main), 2023	11	2023
Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages P Röttger, D Nozza, F Bianchi, D Hovy EMNLP 2022 (Main), 2022	11	2022
The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models HR Kirk, A Whitefield, P Röttger, A Bean, K Margatina, J Ciro, R Mosquera, ... arXiv preprint arXiv:2404.16019, 2024	8	2024
Simplesafetytests: a test suite for identifying critical safety risks in large language models B Vidgen, HR Kirk, R Qian, N Scherrer, A Kannappan, SA Hale, P Röttger arXiv preprint arXiv:2311.08370, 2023	8	2023
The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising “Alignment” in Large Language Models HR Kirk, B Vidgen, P Röttger, SA Hale NeurIPS 2023 (SoLaR Workshop), 2023	5	2023
Introducing v0.5 of the AI Safety Benchmark from MLCommons B Vidgen, A Agrawal, AM Ahmed, V Akinwande, N Al-Nuaimi, N Alfaraj, ... arXiv preprint arXiv:2404.12241, 2024	3	2024
Safetyprompts: a systematic review of open datasets for evaluating and improving large language model safety P Röttger, F Pernisi, B Vidgen, D Hovy arXiv preprint arXiv:2404.05399, 2024	3	2024
Near to Mid-term Risks and Opportunities of Open Source Generative AI F Eiras, A Petrov, B Vidgen, CS de Witt, F Pizzati, K Elkins, ... ICML 2024 (Oral), 2024	2	2024

系统目前无法执行此操作，请稍后再试。

文章 1–20

每年引用数

重复的引用

合并的引用

添加合著者合著作者

上传 PDF

关注此作者

引用次数

合著作者

引用