Cooperative Inverse Reinforcement Learning D Hadfield-Menell, SJ Russell, P Abbeel, A Dragan Advances in Neural Information Processing Systems 29, 2016 | 751 | 2016 |
Inverse Reward Design D Hadfield-Menell, S Milli, P Abbeel, SJ Russell, A Dragan Advances in Neural Information Processing Systems 30, 2017 | 433 | 2017 |
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ... Transactions on Machine Learning Research, 2023 | 236 | 2023 |
The off-switch game D Hadfield-Menell, A Dragan, P Abbeel, S Russell Proceedings of the Twenty-Sixth International Joint Conference on Artificial …, 2017 | 164 | 2017 |
Toward Transparent AI: A survey on interpreting the inner structures of deep neural networks T Räuker, A Ho, S Casper, D Hadfield-Menell 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 464-483, 2023 | 120 | 2023 |
On the geometry of adversarial examples M Khoury, D Hadfield-Menell arXiv preprint arXiv:1811.00525, 2018 | 103* | 2018 |
Pragmatic-pedagogic value alignment JF Fisac, MA Gates, JB Hamrick, C Liu, D Hadfield-Menell, ... Robotics research: the 18th international symposium Isrr, 49-57, 2020 | 97 | 2020 |
Guided search for task and motion plans using learned heuristics R Chitnis, D Hadfield-Menell, A Gupta, S Srivastava, E Groshev, C Lin, ... 2016 IEEE International Conference on Robotics and Automation (ICRA), 447-454, 2016 | 81 | 2016 |
Incomplete contracting and AI alignment D Hadfield-Menell, GK Hadfield Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, 417-422, 2019 | 78 | 2019 |
Should robots be obedient? S Milli, D Hadfield-Menell, A Dragan, S Russell Proceedings of the 26th International Joint Conference on Artificial …, 2017 | 76 | 2017 |
What are you optimizing for? aligning recommender systems with human values J Stray, I Vendrov, J Nixon, S Adler, D Hadfield-Menell arXiv preprint arXiv:2107.10939, 2021 | 71 | 2021 |
Conservative Agency via Attainable Utility Preservation AM Turner, D Hadfield-Menell, P Tadepalli Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 385-391, 2020 | 65 | 2020 |
Consequences of Misaligned AI S Zhuang, D Hadfield-Menell Advances in Neural Information Processing Systems 33, 15763-15773, 2020 | 65 | 2020 |
On the utility of model learning in hri R Choudhury, G Swamy, D Hadfield-Menell, AD Dragan 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI …, 2019 | 61 | 2019 |
Expressive robot motion timing A Zhou, D Hadfield-Menell, A Nagabandi, AD Dragan Proceedings of the 2017 ACM/IEEE international conference on human-robot …, 2017 | 59 | 2017 |
Modular task and motion planning in belief space D Hadfield-Menell, E Groshev, R Chitnis, P Abbeel 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems …, 2015 | 55 | 2015 |
Explore, establish, exploit: Red teaming language models from scratch S Casper, J Lin, J Kwon, G Culp, D Hadfield-Menell arXiv preprint arXiv:2306.09442, 2023 | 49 | 2023 |
The assistive multi-armed bandit L Chan, D Hadfield-Menell, S Srinivasa, A Dragan 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI …, 2019 | 48 | 2019 |
Spurious normativity enhances learning of compliance and enforcement behavior in artificial agents R Köster, D Hadfield-Menell, R Everett, L Weidinger, GK Hadfield, ... Proceedings of the National Academy of Sciences 119 (3), e2106028118, 2022 | 45* | 2022 |
An efficient, generalized bellman update for cooperative inverse reinforcement learning D Malik, M Palaniappan, J Fisac, D Hadfield-Menell, S Russell, A Dragan International Conference on Machine Learning, 3394-3402, 2018 | 43 | 2018 |