Beyond the imitation game: Quantifying and extrapolating the capabilities of language models A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, ... arXiv preprint arXiv:2206.04615, 2022 | 938 | 2022 |
Deep Convolutional Networks as shallow Gaussian Processes A Garriga-Alonso, L Aitchison, CE Rasmussen arXiv preprint arXiv:1808.05587, 2018 | 286 | 2018 |
Bayesian neural network priors revisited V Fortuin, A Garriga-Alonso, SW Ober, F Wenzel, G Rätsch, RE Turner, ... arXiv preprint arXiv:2102.06571, 2021 | 144 | 2021 |
Towards automated circuit discovery for mechanistic interpretability A Conmy, A Mavor-Parker, A Lynch, S Heimersheim, A Garriga-Alonso Advances in Neural Information Processing Systems 36, 16318-16352, 2023 | 112 | 2023 |
Understanding variational inference in function-space DR Burt, SW Ober, A Garriga-Alonso, M van der Wilk arXiv preprint arXiv:2011.09421, 2020 | 45 | 2020 |
Causal scrubbing: A method for rigorously testing interpretability hypotheses L Chan, A Garriga-Alonso, N Goldowsky-Dill, R Greenblatt, ... AI Alignment Forum, 10, 2022 | 44 | 2022 |
Exact Langevin dynamics with stochastic gradients A Garriga-Alonso, V Fortuin arXiv preprint arXiv:2102.01691, 2021 | 37 | 2021 |
Data augmentation in Bayesian neural networks and the cold posterior effect S Nabarro, S Ganev, A Garriga-Alonso, V Fortuin, M van der Wilk, ... Uncertainty in Artificial Intelligence, 1434-1444, 2022 | 31 | 2022 |
BNNpriors: A library for Bayesian neural network inference with different prior distributions V Fortuin, A Garriga-Alonso, M van der Wilk, L Aitchison Software Impacts 9, 100079, 2021 | 25 | 2021 |
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models, 2022 A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, ... arXiv preprint arXiv:2206.04615, 2022 | 15 | 2022 |
Correlated weights in infinite limits of deep convolutional neural networks A Garriga-Alonso, M van der Wilk Uncertainty in Artificial Intelligence, 1998-2007, 2021 | 6 | 2021 |
Probability Density Imputation of Missing Data with Gaussian Mixture Models A Garriga-Alonso University of Oxford, 2017 | 1 | 2017 |
Solving Montezuma's Revenge with Planning and Reinforcement Learning A Garriga-Alonso Universitat Pompeu Fabra, 2016 | 1 | 2016 |
Planning behavior in a recurrent neural network that plays Sokoban A Garriga-Alonso, M Taufeeque, A Gleave arXiv preprint arXiv:2407.15421, 2024 | | 2024 |
Adversarial Circuit Evaluation A Garriga-Alonso arXiv preprint arXiv:2407.15166, 2024 | | 2024 |
Catastrophic Goodhart: regularizing RLHF with KL divergence does not mitigate heavy-tailed reward misspecification T Kwa, D Thomas, A Garriga-Alonso arXiv preprint arXiv:2407.14503, 2024 | | 2024 |
Investigating the Indirect Object Identification circuit in Mamb D Ensign, A Garriga-Alonso arXiv preprint arXiv:2407.14008, 2024 | | 2024 |
InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques R Gupta, I Arcuschin, T Kwa, A Garriga-Alonso arXiv preprint arXiv:2407.14494, 2024 | | 2024 |
Analyzing the Generalization and Reliability of Steering Vectors--ICML 2024 D Tan, D Chanin, A Lynch, D Kanoulas, B Paige, A Garriga-Alonso, R Kirk arXiv preprint arXiv:2407.12404, 2024 | | 2024 |
Priors in finite and infinite Bayesian convolutional neural networks A Garriga Alonso | | 2023 |