A theory of representation learning gives a deep generalisation of kernel methods AX Yang, M Robeyns, E Milsom, B Anson, N Schoots, L Aitchison International Conference on Machine Learning, 39380-39415, 2023 | 13 | 2023 |
Improving activation steering in language models with mean-centring O Jorgensen, D Cope, N Schoots, M Shanahan Responsible Language Models @AAAI, 2023 | 11 | 2023 |
Dissecting Language Models: Machine Unlearning via Selective Pruning N Pochinkov, N Schoots arXiv preprint arXiv:2403.01267, 2024 | 6 | 2024 |
Any Deep ReLU Network is Shallow MJ Villani, N Schoots arXiv preprint arXiv:2306.11827, 2023 | 5 | 2023 |
Dissecting Large Language Models N Pochinkov, N Schoots Socially Responsible Language Modelling Research @NeurIPS, 2023 | 3 | 2023 |
Learning to Communicate with Strangers via Channel Randomisation Methods D Cope, N Schoots 4th Workshop on Emergent Communication at NeurIPS 2020, 2021 | 3 | 2021 |
Safety Properties of Inductive Logic Programming. G Leech, N Schoots, J Skalse SafeAI @AAAI, 2021 | 2 | 2021 |
Comparing Optimization Targets for Contrast-Consistent Search H Fry, S Fallows, I Fan, J Wright, N Schoots Socially Responsible Language Modelling Research @NeurIPS, 2023 | 1 | 2023 |
Finding Sparse Initialisations using Neuroevolutionary Ticket Search (NeTS) A Jackson, N Schoots, A Ahantab, M Luck, E Black Artificial Life Conference Proceedings 35 2023 (1), 110, 2023 | 1 | 2023 |
The Propensity for Density in Feed-forward Models N Schoots, A Jackson, A Kholmovaia, P McBurney, M Shanahan 27th European Conference on Artificial Intelligence, 2024 | | 2024 |
Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs Y Mathew, O Matthews, R McCarthy, J Velja, CS de Witt, D Cope, ... arXiv preprint arXiv:2410.03768, 2024 | | 2024 |
Training Neural Networks for Modularity aids Interpretability S Golechha, D Cope, N Schoots arXiv preprint arXiv:2409.15747, 2024 | | 2024 |
Extending Activation Steering to Broad Skills and Multiple Behaviours T van der Weij, M Poesio, N Schoots arXiv preprint arXiv:2403.05767, 2024 | | 2024 |
Low-Entropy Latent Variables Hurt Out-of-Distribution Performance N Schoots, D Cope Domain Generalization @ICLR, 2023 | | 2023 |