A theory of representation learning gives a deep generalisation of kernel methods AX Yang, M Robeyns, E Milsom, B Anson, N Schoots, L Aitchison International Conference on Machine Learning, 39380-39415, 2023 | 19 | 2023 |
Improving activation steering in language models with mean-centring O Jorgensen, D Cope, N Schoots, M Shanahan Responsible Language Models @AAAI, 2023 | 14 | 2023 |
Any Deep ReLU Network is Shallow MJ Villani, N Schoots arXiv preprint arXiv:2306.11827, 2023 | 9 | 2023 |
Dissecting Language Models: Machine Unlearning via Selective Pruning N Pochinkov, N Schoots arXiv preprint arXiv:2403.01267, 2024 | 8 | 2024 |
Dissecting Large Language Models N Pochinkov, N Schoots Socially Responsible Language Modelling Research @NeurIPS, 2023 | 4 | 2023 |
Learning to Communicate with Strangers via Channel Randomisation Methods D Cope, N Schoots 4th Workshop on Emergent Communication at NeurIPS 2020, 2021 | 3 | 2021 |
Extending Activation Steering to Broad Skills and Multiple Behaviours T van der Weij, M Poesio, N Schoots arXiv preprint arXiv:2403.05767, 2024 | 2 | 2024 |
Finding Sparse Initialisations using Neuroevolutionary Ticket Search (NeTS) A Jackson, N Schoots, A Ahantab, M Luck, E Black Artificial Life Conference Proceedings 35 2023 (1), 110, 2023 | 2 | 2023 |
Safety Properties of Inductive Logic Programming. G Leech, N Schoots, J Skalse SafeAI @AAAI, 2021 | 2 | 2021 |
Comparing Optimization Targets for Contrast-Consistent Search H Fry, S Fallows, I Fan, J Wright, N Schoots Socially Responsible Language Modelling Research @NeurIPS, 2023 | 1 | 2023 |
Low-Entropy Latent Variables Hurt Out-of-Distribution Performance N Schoots, D Cope Domain Generalization @ICLR, 2023 | 1 | 2023 |
Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs Y Mathew, O Matthews, R McCarthy, J Velja, CS de Witt, D Cope, ... arXiv preprint arXiv:2410.03768, 2024 | | 2024 |
Training Neural Networks for Modularity aids Interpretability S Golechha, D Cope, N Schoots arXiv preprint arXiv:2409.15747, 2024 | | 2024 |
Channel Randomisation Methods for Zero-Shot Communication D Cope, N Schoots ECAI 2024, 3620-3627, 2024 | | 2024 |
The Propensity for Density in Feed-forward Models N Schoots, A Jackson, A Kholmovia, P McBurney, M Shanahan ECAI 2024, 2830-2837, 2024 | | 2024 |
Steganography in Large Language Models: Investigating Emergence and Mitigations Y Mathew, R McCarthy, O Matthews, J Velja, N Schoots, D Cope Red Teaming GenAI: What Can We Learn from Adversaries?, 0 | | |
Emergence of Steganography Between Large Language Models Y Mathew, R McCarthy, J Velja, O Matthews, N Schoots, D Cope Workshop on Socially Responsible Language Modelling Research, 0 | | |