Rethink reporting of evaluation results in AI R Burnell, W Schellaert, J Burden, TD Ullman, F Martinez-Plumed, ... Science 380 (6641), 136-138, 2023 | 62* | 2023 |
Training on the Test Set: Mapping the System-Problem Space in AI J Hernández-Orallo, W Schellaert, F Martínez-Plumed Proceedings of the AAAI Conference on Artificial Intelligence 36 (11), 12256 …, 2022 | 5 | 2022 |
Reject Before You Run: Small Assessors Anticipate Big Language Models L Zhou, F Martínez-Plumed, J Hernández-Orallo, C Ferri, W Schellaert Evaluation Beyond Metrics Workshop @ IJCAI, 2022 | 5 | 2022 |
Animal-AI 3: What's New & Why You Should Care K Voudouris, I Alhas, W Schellaert, M Crosby, J Holmes, J Burden, ... arXiv preprint arXiv:2312.11414, 2023 | 4 | 2023 |
Your Prompt is My Command: On Assessing the Human-Centred Generality of Multimodal Models W Schellaert, F Martínez-Plumed, K Vold, J Burden, PAM Casares, ... Journal of Artificial Intelligence Research 77, 377-394, 2023 | 4* | 2023 |
Predictable Artificial Intelligence L Zhou, PA Moreno-Casares, F Martínez-Plumed, J Burden, R Burnell, ... arXiv preprint arXiv:2310.06167, 2023 | 2 | 2023 |
A Proposal for Scaling the Scaling Laws W Schellaert, R Hamon, F Martínez-Plumed, J Hernandez-Orallo Proceedings of the First edition of the Workshop on the Scaling Behavior of …, 2024 | | 2024 |
Investigating Object Permanence in Deep Reinforcement Learning Agents K Voudouris, JD Liu, N Siwinska, W Schellaert, LG Cheke Proceedings of the Annual Meeting of the Cognitive Science Society 46, 2024 | | 2024 |
Assessing AI capabilities with education tests M Staneva, A Baret, Á Aso-Mollar, J Blass, SC Ponz, V Conitzer, U Cortes, ... OECD, 2023 | | 2023 |
General Interaction Battery: Simple Object Navigation and Affordances (Gibsona) D Rutar, W Schellaert, A Markelius, LG Cheke, J Hernández-Orallo Available at SSRN 4871025, 0 | | |