关注
Adrià Garriga-Alonso
Adrià Garriga-Alonso
Research Scientist, FAR AI
在 far.ai 的电子邮件经过验证 - 首页
标题
引用次数
引用次数
年份
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models
A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, ...
arXiv preprint arXiv:2206.04615, 2022
9382022
Deep Convolutional Networks as shallow Gaussian Processes
A Garriga-Alonso, L Aitchison, CE Rasmussen
arXiv preprint arXiv:1808.05587, 2018
2862018
Bayesian neural network priors revisited
V Fortuin, A Garriga-Alonso, SW Ober, F Wenzel, G Rätsch, RE Turner, ...
arXiv preprint arXiv:2102.06571, 2021
1442021
Towards automated circuit discovery for mechanistic interpretability
A Conmy, A Mavor-Parker, A Lynch, S Heimersheim, A Garriga-Alonso
Advances in Neural Information Processing Systems 36, 16318-16352, 2023
1122023
Understanding variational inference in function-space
DR Burt, SW Ober, A Garriga-Alonso, M van der Wilk
arXiv preprint arXiv:2011.09421, 2020
452020
Causal scrubbing: A method for rigorously testing interpretability hypotheses
L Chan, A Garriga-Alonso, N Goldowsky-Dill, R Greenblatt, ...
AI Alignment Forum, 10, 2022
442022
Exact Langevin dynamics with stochastic gradients
A Garriga-Alonso, V Fortuin
arXiv preprint arXiv:2102.01691, 2021
372021
Data augmentation in Bayesian neural networks and the cold posterior effect
S Nabarro, S Ganev, A Garriga-Alonso, V Fortuin, M van der Wilk, ...
Uncertainty in Artificial Intelligence, 1434-1444, 2022
312022
BNNpriors: A library for Bayesian neural network inference with different prior distributions
V Fortuin, A Garriga-Alonso, M van der Wilk, L Aitchison
Software Impacts 9, 100079, 2021
252021
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models, 2022
A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, ...
arXiv preprint arXiv:2206.04615, 2022
152022
Correlated weights in infinite limits of deep convolutional neural networks
A Garriga-Alonso, M van der Wilk
Uncertainty in Artificial Intelligence, 1998-2007, 2021
62021
Probability Density Imputation of Missing Data with Gaussian Mixture Models
A Garriga-Alonso
University of Oxford, 2017
12017
Solving Montezuma's Revenge with Planning and Reinforcement Learning
A Garriga-Alonso
Universitat Pompeu Fabra, 2016
12016
Planning behavior in a recurrent neural network that plays Sokoban
A Garriga-Alonso, M Taufeeque, A Gleave
arXiv preprint arXiv:2407.15421, 2024
2024
Adversarial Circuit Evaluation
A Garriga-Alonso
arXiv preprint arXiv:2407.15166, 2024
2024
Catastrophic Goodhart: regularizing RLHF with KL divergence does not mitigate heavy-tailed reward misspecification
T Kwa, D Thomas, A Garriga-Alonso
arXiv preprint arXiv:2407.14503, 2024
2024
Investigating the Indirect Object Identification circuit in Mamb
D Ensign, A Garriga-Alonso
arXiv preprint arXiv:2407.14008, 2024
2024
InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques
R Gupta, I Arcuschin, T Kwa, A Garriga-Alonso
arXiv preprint arXiv:2407.14494, 2024
2024
Analyzing the Generalization and Reliability of Steering Vectors--ICML 2024
D Tan, D Chanin, A Lynch, D Kanoulas, B Paige, A Garriga-Alonso, R Kirk
arXiv preprint arXiv:2407.12404, 2024
2024
Priors in finite and infinite Bayesian convolutional neural networks
A Garriga Alonso
2023
系统目前无法执行此操作,请稍后再试。
文章 1–20