Samuel Marks 个人学术档案 - 学术资源搜索

引用次数

	总计	2019 年至今
引用	378	378
h 指数	5	5
i10 指数	3	3

0

320

160

80

240

2023202477 301

开放获取的出版物数量

1 篇文章

0 篇文章

可查看的文章

无法查看的文章

根据资助方的强制性开放获取政策

Samuel Marks

Samuel Marks

Postdoctoral researcher, Northeastern University

在 northeastern.edu 的电子邮件经过验证 - 首页

large language models interpretability AI safety


标题按引用次数排序按年份排序按标题排序	引用次数引用次数	年份
Open problems and fundamental limitations of reinforcement learning from human feedback S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ... arXiv preprint arXiv:2307.15217, 2023	280	2023
The geometry of truth: Emergent linear structure in large language model representations of true/false datasets S Marks, M Tegmark arXiv preprint arXiv:2310.06824, 2023	52	2023
Sparse feature circuits: Discovering and editing interpretable causal graphs in language models S Marks, C Rager, EJ Michaud, Y Belinkov, D Bau, A Mueller arXiv preprint arXiv:2403.19647, 2024	22	2024
Open problems and fundamental limitations of reinforcement learning from human feedback. CoRR, abs/2307.15217, 2023. doi: 10.48550 S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ... arXiv preprint ARXIV.2307.15217, 0	7
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback (arXiv: 2307.15217). arXiv S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ...	5	2023
& Hadfield-Menell, D.(2023). Open problems and fundamental limitations of reinforcement learning from human feedback S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ... arXiv preprint arXiv:2307.15217, 0	5
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models C Denison, M MacDiarmid, F Barez, D Duvenaud, S Kravec, S Marks, ... arXiv preprint arXiv:2406.10162, 2024	4	2024
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data J Treutlein, D Choi, J Betley, C Anil, S Marks, RB Grosse, O Evans arXiv preprint arXiv:2406.14546, 2024	2	2024
Measuring progress in dictionary learning for language model interpretability with board game models A Karvonen, B Wright, C Rager, R Angell, J Brinkmann, L Smith, ... arXiv preprint arXiv:2408.00113, 2024	1	2024
The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability A Mueller, J Brinkmann, M Li, S Marks, K Pal, N Prakash, C Rager, ... arXiv preprint arXiv:2408.01416, 2024		2024
NNsight and NDIF: Democratizing Access to Foundation Model Internals J Fiotto-Kaufman, AR Loftus, E Todd, J Brinkmann, C Juang, K Pal, ... arXiv preprint arXiv:2407.14561, 2024		2024
Prismatic -crystals and Lubin-Tate -modules S Marks arXiv preprint arXiv:2303.07620, 2023		2023
Laurent F-Crystals and Lubin-Tate (φq, Γ)-Modules S Marks Harvard University, 2023		2023
p-adic Modular Formsa la Serre S Marks		2020
Derivatives of p-adic Siegel Eisenstein series and p-adic degrees of arithmetic cycles SP Marks Princeton University, 2019		2019
p-Adic Properties of Hauptmoduln with Applications to Moonshine RC Chen, S Marks, M Tyler SIGMA. Symmetry, Integrability and Geometry: Methods and Applications 15, 033, 2019		2019
Prismatic F-crystals and Lubin-Tate (φq, Γ)-modules S Marks

系统目前无法执行此操作，请稍后再试。

文章 1–17

共建清朗的网络空间,如遇有害信息,请举报。
本站数据皆整合自互联网公开资源索引,方便科研学术方面查询,并不存储相关数据资源;如对此有异议,请联系我们解决.
© 2023 学术资源搜索 @联系我们 | 申请短期会员 | 数据源提交