AI safety gridworlds J Leike, M Martic, V Krakovna, PA Ortega, T Everitt, A Lefrancq, L Orseau, ... arXiv preprint arXiv:1711.09883, 2017 | 328 | 2017 |
Scalable agent alignment via reward modeling: a research direction J Leike, D Krueger, T Everitt, M Martic, V Maini, S Legg arXiv preprint arXiv:1811.07871, 2018 | 278 | 2018 |
AGI safety literature review T Everitt, G Lea, M Hutter International Joint Conference on AI (IJCAI), 2018 | 147 | 2018 |
Count-based exploration in feature space for reinforcement learning J Martin, SN Sasikumar, T Everitt, M Hutter International Joint Conference on AI (IJCAI), 2017 | 133 | 2017 |
Alignment of language agents Z Kenton, T Everitt, L Weidinger, I Gabriel, V Mikulik, G Irving arXiv preprint arXiv:2103.14659, 2021 | 132 | 2021 |
Reinforcement Learning with Corrupted Reward Channel T Everitt, V Krakovna, L Orseau, M Hutter, S Legg 26th International Joint Conference on Artificial Intelligence (IJCAI), 2017 | 125 | 2017 |
Specification gaming: the flip side of AI ingenuity V Krakovna, J Uesato, V Mikulik, M Rahtz, T Everitt, R Kumar, Z Kenton, ... DeepMind Blog 3, 2020 | 94 | 2020 |
Reward tampering problems and solutions in reinforcement learning: A causal influence diagram perspective T Everitt, M Hutter, R Kumar, V Krakovna Synthese, 2021 | 89 | 2021 |
Avoiding wireheading with value reinforcement learning T Everitt, M Hutter International Conference on Artificial General Intelligence (AGI), 12-22, 2016 | 49 | 2016 |
Shaking the foundations: delusions in sequence models for interaction and control PA Ortega, M Kunesch, G Delétang, T Genewein, J Grau-Moya, J Veness, ... arXiv preprint arXiv:2110.10819, 2021 | 48 | 2021 |
Agent incentives: A causal perspective T Everitt, R Carey, ED Langlois, PA Ortega, S Legg Proceedings of the AAAI Conference on Artificial Intelligence 35 (13), 11487 …, 2021 | 46 | 2021 |
Towards safe artificial general intelligence T Everitt PQDT-Global, 2019 | 34 | 2019 |
Understanding agent incentives using causal influence diagrams. Part I: Single action settings T Everitt, PA Ortega, E Barnes, S Legg arXiv preprint arXiv:1902.09980, 2019 | 32 | 2019 |
Self-modification of policy and utility function in rational agents T Everitt, D Filan, M Daswani, M Hutter International Conference on Artificial General Intelligence (AGI), 1-11, 2016 | 31 | 2016 |
Universal artificial intelligence: Practical agents and fundamental challenges T Everitt, M Hutter Foundations of trusted autonomy, 15-46, 2018 | 29 | 2018 |
Modeling AGI safety frameworks with causal influence diagrams T Everitt, R Kumar, V Krakovna, S Legg arXiv preprint arXiv:1906.08663, 2019 | 24 | 2019 |
Artificial general intelligence T Everitt, B Goertzel, A Potapov Lecture Notes in Artificial Intelligence. Heidelberg: Springer 92, 2017 | 24 | 2017 |
Discovering agents Z Kenton, R Kumar, S Farquhar, J Richens, M MacDermott, T Everitt Artificial Intelligence 322, 103963, 2023 | 21 | 2023 |
Path-specific objectives for safer agent incentives S Farquhar, R Carey, T Everitt Proceedings of the AAAI Conference on Artificial Intelligence 36 (9), 9529-9538, 2022 | 21 | 2022 |
A game-theoretic analysis of the off-switch game T Wängberg, M Böörs, E Catt, T Everitt, M Hutter Artificial General Intelligence: 10th International Conference, AGI 2017 …, 2017 | 20 | 2017 |