关注
Dan Braun
Dan Braun
Apollo Research
在 apolloresearch.ai 的电子邮件经过验证
标题
引用次数
引用次数
年份
Taking features out of superposition with sparse autoencoders
L Sharkey, D Braun, B Millidge
AI Alignment Forum 6, 12-13, 2022
19*2022
Interpreting neural networks through the polytope lens
S Black, L Sharkey, L Grinsztajn, E Winsor, D Braun, J Merizian, K Parker, ...
arXiv preprint arXiv:2211.12312, 2022
182022
A Causal Framework for AI Regulation and Auditing
L Sharkey, CN Ghuidhir, D Braun, J Scheurer, M Balesni, L Bushnaq, ...
Preprints, 2024
14*2024
Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning
D Braun, J Taylor, N Goldowsky-Dill, L Sharkey
arXiv preprint arXiv:2405.12241, 2024
122024
Using Degeneracy in the Loss Landscape for Mechanistic Interpretability
L Bushnaq, J Mendel, S Heimersheim, D Braun, N Goldowsky-Dill, ...
arXiv preprint arXiv:2405.10927, 2024
52024
Towards evaluations-based safety cases for AI scheming
M Balesni, M Hobbhahn, D Lindner, A Meinke, T Korbak, J Clymer, ...
arXiv preprint arXiv:2411.03336, 2024
22024
The local interaction basis: Identifying computationally-relevant and sparsely interacting features in neural networks
L Bushnaq, S Heimersheim, N Goldowsky-Dill, D Braun, J Mendel, ...
arXiv preprint arXiv:2405.10928, 2024
22024
Construction and Elicitation of a Black Box Model in the Game of Bridge
V Ventos, D Braun, C Deheeger, JP Desmoulins, JB Fantun, S Legras, ...
Advances in Knowledge Discovery and Management: Volume 10, 29-53, 2024
2024
系统目前无法执行此操作,请稍后再试。
文章 1–8