Arthur Conmy 个人学术档案 - 学术资源搜索

引用次数

	总计	2019 年至今
引用	438	438
h 指数	8	8
i10 指数	6	6

320

160

240

2022202320245 127 303

合著作者

Alexandre VariengienENS de Lyon & EPFL在 ens-lyon.fr 的电子邮件经过验证
Neel NandaResearch Engineer, Google DeepMind在 deepmind.com 的电子邮件经过验证
Jacob SteinhardtStanford University在 cs.stanford.edu 的电子邮件经过验证
Adrià Garriga-AlonsoResearch Scientist, FAR AI在 far.ai 的电子邮件经过验证
Stefan HeimersheimInstitute of Astronomy, University of Cambridge在 cam.ac.uk 的电子邮件经过验证
Aengus LynchPhD Student, University College London在 ucl.ac.uk 的电子邮件经过验证
Nicholas CarliniGoogle DeepMind在 google.com 的电子邮件经过验证
Daniel PalekaETH Zurich在 inf.ethz.ch 的电子邮件经过验证
Can RagerIndependent在 northeastern.edu 的电子邮件经过验证
Aaquib SyedStudent, University of Maryland在 umd.edu 的电子邮件经过验证
Lewis SmithPhD Student, University of Oxford在 kellogg.ox.ac.uk 的电子邮件经过验证
Rohin ShahResearch Scientist, Google DeepMind在 deepmind.com 的电子邮件经过验证
Janos KramarDeepMind在 google.com 的电子邮件经过验证
Senthooran RajamanoharanGoogle DeepMind在 google.com 的电子邮件经过验证
Rhys GouldMathematics Undergraduate, University of Cambridge在 cam.ac.uk 的电子邮件经过验证
Euan OngResearch Assistant, University of Cambridge在 cam.ac.uk 的电子邮件经过验证
Vikrant VarmaDeepMind在 deepmind.com 的电子邮件经过验证
Rowan Wang在 rdwrs.com 的电子邮件经过验证
Tom LieberumGoogle DeepMind在 deepmind.com 的电子邮件经过验证
Itay YonaGoogle DeepMind在 google.com 的电子邮件经过验证

关注

Arthur Conmy

Google DeepMind

在 google.com 的电子邮件经过验证 - 首页

Mechanistic Interpretability AI Safety


标题按引用次数排序按年份排序按标题排序	引用次数引用次数	年份
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small K Wang, A Variengien, A Conmy, B Shlegeris, J Steinhardt ICLR 2023, 2022	238	2022
Towards Automated Circuit Discovery for Mechanistic Interpretability A Conmy, AN Mavor-Parker, A Lynch, S Heimersheim, A Garriga-Alonso NeurIPS 2023 Spotlight, 2023	116	2023
Stealing Part of a Production Language Model N Carlini, D Paleka, KD Dvijotham, T Steinke, J Hayase, AF Cooper, ... ICML 2024 Oral, 2024	19	2024
Attribution Patching Outperforms Automated Circuit Discovery A Syed, C Rager, A Conmy NeurIPS 2023 Workshop (Attributing Model Behavior at Scale), 2023	16	2023
Copy Suppression: Comprehensively Understanding an Attention Head C McDougall, A Conmy, C Rushing, T McGrath, N Nanda NeurIPS 2023 Workshop (Attributing Model Behavior at Scale), 2023	16	2023
Successor Heads: Recurring, Interpretable Attention Heads In The Wild R Gould, E Ong, G Ogden, A Conmy ICLR 2024, 2023	11	2023
Interpreting Attention Layer Outputs with Sparse Autoencoders C Kissane, R Krzyzanowski, JI Bloom, A Conmy, N Nanda ICML 2024 Mechanistic Interpretability Workshop Spotlight, 2024	9*	2024
Improving Dictionary Learning with Gated Sparse Autoencoders S Rajamanoharan, A Conmy, L Smith, T Lieberum, V Varma, J Kramár, ... ICML 2024 Mechanistic Interpretability Workshop, 2024	8*	2024
StyleGAN-induced Data-Driven Regularization for Inverse Problems A Conmy, S Mukherjee, CB Schönlieb IEEE ICASSP 2022, 2022	5	2022
Activation Steering with SAEs A Conmy, N Nanda www.alignmentforum.org/posts/C5KAZQib3bzzpeyrg, 2024		2024

系统目前无法执行此操作，请稍后再试。

文章 1–10

每年引用数

重复的引用

合并的引用

添加合著者合著作者

上传 PDF

关注此作者

引用次数

合著作者

引用