Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ... arXiv preprint arXiv:2307.15217, 2023 | 239 | 2023 |
Towards understanding grokking: An effective theory of representation learning Z Liu, O Kitouni, NS Nolte, E Michaud, M Tegmark, M Williams Advances in Neural Information Processing Systems 35, 34651-34663, 2022 | 90 | 2022 |
Omnigrok: Grokking beyond algorithmic data Z Liu, EJ Michaud, M Tegmark The Eleventh International Conference on Learning Representations, 2022 | 64 | 2022 |
The Quantization Model of Neural Scaling EJ Michaud, Z Liu, U Girit, M Tegmark Advances in Neural Information Processing Systems 36, 2023 | 37 | 2023 |
Understanding Learned Reward Functions EJ Michaud, A Gleave, S Russell Deep RL Workshop, NeurIPS 2020, 2020 | 29 | 2020 |
Precision Machine Learning EJ Michaud, Z Liu, M Tegmark Entropy 25 (1), 175, 2023 | 19 | 2023 |
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models S Marks, C Rager, EJ Michaud, Y Belinkov, D Bau, A Mueller arXiv preprint arXiv:2403.19647, 2024 | 13 | 2024 |
Examining the Causal Structures of Deep Neural Networks Using Information Theory S Marrow, EJ Michaud, E Hoel Entropy 22 (12), 1429, 2020 | 7* | 2020 |
Opening the AI black box: program synthesis via mechanistic interpretability EJ Michaud, I Liao, V Lad, Z Liu, A Mudide, C Loughridge, ZC Guo, ... arXiv preprint arXiv:2402.05110, 2024 | 6 | 2024 |
Not All Language Model Features Are Linear J Engels, I Liao, EJ Michaud, W Gurnee, M Tegmark arXiv preprint arXiv:2405.14860, 2024 | 1 | 2024 |
Lunar Opportunities for SETI EJ Michaud, APV Siemion, J Drew, SP Worden arXiv preprint arXiv:2009.12689, 2020 | 1 | 2020 |
Survival of the Fittest Representation: A Case Study with Modular Addition X Delores Ding, ZC Guo, EJ Michaud, Z Liu, M Tegmark arXiv e-prints, arXiv: 2405.17420, 2024 | | 2024 |
SETI from the Lunar South Pole EJ Michaud, APV Siemion, J Drew, SP Worden | | |