Adrià Garriga-Alonso 个人学术档案

引用次数

	总计	2019 年至今
引用	1685	1679
h 指数	10	10
i10 指数	10	10

660

330

165

495

20192020202120222023202433 52 126 197 649 617

开放获取的出版物数量

查看全部

3 篇文章

0 篇文章

可查看的文章

无法查看的文章

根据资助方的强制性开放获取政策

合著作者

Laurence AitchisonUniversity of Bristol在 bristol.ac.uk 的电子邮件经过验证
Mark van der WilkAssociate Professor, University of Oxford在 cs.ox.ac.uk 的电子邮件经过验证
Vincent FortuinPrincipal Investigator, Helmholtz AI & TU Munich在 tum.de 的电子邮件经过验证
Aengus LynchUniversity College London, MATS在 ucl.ac.uk 的电子邮件经过验证
Sebastian W. OberPhD Student, University of Cambridge在 cam.ac.uk 的电子邮件经过验证
Carl Edward RasmussenProfessor of Machine Learning, University of Cambridge在 cam.ac.uk 的电子邮件经过验证
Richard E TurnerProfessor, University of Cambridge在 cam.ac.uk 的电子邮件经过验证
Gunnar RätschProfessor, ETH Zürich在 inf.ethz.ch 的电子邮件经过验证
Florian WenzelCTO/Co-founder at Mirelo AI在 mirelo.ai 的电子邮件经过验证
Stefan HeimersheimInstitute of Astronomy, University of Cambridge在 cam.ac.uk 的电子邮件经过验证
Augustine Mavor-ParkerUniversity College London在 cs.ucl.ac.uk 的电子邮件经过验证
David R. BurtMassachusetts Institute of Technology在 mit.edu 的电子邮件经过验证
Seth NabarroPhD Student, Imperial College London在 ic.ac.uk 的电子邮件经过验证
Anders JonssonArtificial Intelligence and Machine Learning group, Universitat Pompeu Fabra在 upf.edu 的电子邮件经过验证

关注

Adrià Garriga-Alonso

Research Scientist, FAR AI

在 far.ai 的电子邮件经过验证 - 首页

AI safety interpretability


标题按引用次数排序按年份排序按标题排序	引用次数引用次数	年份
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, ... arXiv preprint arXiv:2206.04615, 2022	938	2022
Deep Convolutional Networks as shallow Gaussian Processes A Garriga-Alonso, L Aitchison, CE Rasmussen arXiv preprint arXiv:1808.05587, 2018	286	2018
Bayesian neural network priors revisited V Fortuin, A Garriga-Alonso, SW Ober, F Wenzel, G Rätsch, RE Turner, ... arXiv preprint arXiv:2102.06571, 2021	144	2021
Towards automated circuit discovery for mechanistic interpretability A Conmy, A Mavor-Parker, A Lynch, S Heimersheim, A Garriga-Alonso Advances in Neural Information Processing Systems 36, 16318-16352, 2023	112	2023
Understanding variational inference in function-space DR Burt, SW Ober, A Garriga-Alonso, M van der Wilk arXiv preprint arXiv:2011.09421, 2020	45	2020
Causal scrubbing: A method for rigorously testing interpretability hypotheses L Chan, A Garriga-Alonso, N Goldowsky-Dill, R Greenblatt, ... AI Alignment Forum, 10, 2022	44	2022
Exact Langevin dynamics with stochastic gradients A Garriga-Alonso, V Fortuin arXiv preprint arXiv:2102.01691, 2021	37	2021
Data augmentation in Bayesian neural networks and the cold posterior effect S Nabarro, S Ganev, A Garriga-Alonso, V Fortuin, M van der Wilk, ... Uncertainty in Artificial Intelligence, 1434-1444, 2022	31	2022
BNNpriors: A library for Bayesian neural network inference with different prior distributions V Fortuin, A Garriga-Alonso, M van der Wilk, L Aitchison Software Impacts 9, 100079, 2021	25	2021
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models, 2022 A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, ... arXiv preprint arXiv:2206.04615, 2022	15	2022
Correlated weights in infinite limits of deep convolutional neural networks A Garriga-Alonso, M van der Wilk Uncertainty in Artificial Intelligence, 1998-2007, 2021	6	2021
Probability Density Imputation of Missing Data with Gaussian Mixture Models A Garriga-Alonso University of Oxford, 2017	1	2017
Solving Montezuma's Revenge with Planning and Reinforcement Learning A Garriga-Alonso Universitat Pompeu Fabra, 2016	1	2016
Planning behavior in a recurrent neural network that plays Sokoban A Garriga-Alonso, M Taufeeque, A Gleave arXiv preprint arXiv:2407.15421, 2024		2024
Adversarial Circuit Evaluation A Garriga-Alonso arXiv preprint arXiv:2407.15166, 2024		2024
Catastrophic Goodhart: regularizing RLHF with KL divergence does not mitigate heavy-tailed reward misspecification T Kwa, D Thomas, A Garriga-Alonso arXiv preprint arXiv:2407.14503, 2024		2024
Investigating the Indirect Object Identification circuit in Mamb D Ensign, A Garriga-Alonso arXiv preprint arXiv:2407.14008, 2024		2024
InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques R Gupta, I Arcuschin, T Kwa, A Garriga-Alonso arXiv preprint arXiv:2407.14494, 2024		2024
Analyzing the Generalization and Reliability of Steering Vectors--ICML 2024 D Tan, D Chanin, A Lynch, D Kanoulas, B Paige, A Garriga-Alonso, R Kirk arXiv preprint arXiv:2407.12404, 2024		2024
Priors in finite and infinite Bayesian convolutional neural networks A Garriga Alonso		2023

系统目前无法执行此操作，请稍后再试。

文章 1–20

每年引用数

重复的引用

合并的引用

添加合著者合著作者

上传 PDF

关注此作者

引用次数

合著作者

引用