关注
Samuel Marks
Samuel Marks
Postdoctoral researcher, Northeastern University
在 northeastern.edu 的电子邮件经过验证 - 首页
标题
引用次数
引用次数
年份
Open problems and fundamental limitations of reinforcement learning from human feedback
S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ...
arXiv preprint arXiv:2307.15217, 2023
2802023
The geometry of truth: Emergent linear structure in large language model representations of true/false datasets
S Marks, M Tegmark
arXiv preprint arXiv:2310.06824, 2023
522023
Sparse feature circuits: Discovering and editing interpretable causal graphs in language models
S Marks, C Rager, EJ Michaud, Y Belinkov, D Bau, A Mueller
arXiv preprint arXiv:2403.19647, 2024
222024
Open problems and fundamental limitations of reinforcement learning from human feedback. CoRR, abs/2307.15217, 2023. doi: 10.48550
S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ...
arXiv preprint ARXIV.2307.15217, 0
7
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback (arXiv: 2307.15217). arXiv
S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ...
52023
& Hadfield-Menell, D.(2023). Open problems and fundamental limitations of reinforcement learning from human feedback
S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ...
arXiv preprint arXiv:2307.15217, 0
5
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models
C Denison, M MacDiarmid, F Barez, D Duvenaud, S Kravec, S Marks, ...
arXiv preprint arXiv:2406.10162, 2024
42024
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data
J Treutlein, D Choi, J Betley, C Anil, S Marks, RB Grosse, O Evans
arXiv preprint arXiv:2406.14546, 2024
22024
Measuring progress in dictionary learning for language model interpretability with board game models
A Karvonen, B Wright, C Rager, R Angell, J Brinkmann, L Smith, ...
arXiv preprint arXiv:2408.00113, 2024
12024
The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability
A Mueller, J Brinkmann, M Li, S Marks, K Pal, N Prakash, C Rager, ...
arXiv preprint arXiv:2408.01416, 2024
2024
NNsight and NDIF: Democratizing Access to Foundation Model Internals
J Fiotto-Kaufman, AR Loftus, E Todd, J Brinkmann, C Juang, K Pal, ...
arXiv preprint arXiv:2407.14561, 2024
2024
Prismatic -crystals and Lubin-Tate -modules
S Marks
arXiv preprint arXiv:2303.07620, 2023
2023
Laurent F-Crystals and Lubin-Tate (φq, Γ)-Modules
S Marks
Harvard University, 2023
2023
p-adic Modular Formsa la Serre
S Marks
2020
Derivatives of p-adic Siegel Eisenstein series and p-adic degrees of arithmetic cycles
SP Marks
Princeton University, 2019
2019
p-Adic Properties of Hauptmoduln with Applications to Moonshine
RC Chen, S Marks, M Tyler
SIGMA. Symmetry, Integrability and Geometry: Methods and Applications 15, 033, 2019
2019
Prismatic F-crystals and Lubin-Tate (φq, Γ)-modules
S Marks
系统目前无法执行此操作,请稍后再试。
文章 1–17