关注
Dylan Hadfield-Menell
Dylan Hadfield-Menell
在 csail.mit.edu 的电子邮件经过验证 - 首页
标题
引用次数
引用次数
年份
Cooperative Inverse Reinforcement Learning
D Hadfield-Menell, SJ Russell, P Abbeel, A Dragan
Advances in Neural Information Processing Systems 29, 2016
7512016
Inverse Reward Design
D Hadfield-Menell, S Milli, P Abbeel, SJ Russell, A Dragan
Advances in Neural Information Processing Systems 30, 2017
4332017
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ...
Transactions on Machine Learning Research, 2023
2362023
The off-switch game
D Hadfield-Menell, A Dragan, P Abbeel, S Russell
Proceedings of the Twenty-Sixth International Joint Conference on Artificial …, 2017
1642017
Toward Transparent AI: A survey on interpreting the inner structures of deep neural networks
T Räuker, A Ho, S Casper, D Hadfield-Menell
2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 464-483, 2023
1202023
On the geometry of adversarial examples
M Khoury, D Hadfield-Menell
arXiv preprint arXiv:1811.00525, 2018
103*2018
Pragmatic-pedagogic value alignment
JF Fisac, MA Gates, JB Hamrick, C Liu, D Hadfield-Menell, ...
Robotics research: the 18th international symposium Isrr, 49-57, 2020
972020
Guided search for task and motion plans using learned heuristics
R Chitnis, D Hadfield-Menell, A Gupta, S Srivastava, E Groshev, C Lin, ...
2016 IEEE International Conference on Robotics and Automation (ICRA), 447-454, 2016
812016
Incomplete contracting and AI alignment
D Hadfield-Menell, GK Hadfield
Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, 417-422, 2019
782019
Should robots be obedient?
S Milli, D Hadfield-Menell, A Dragan, S Russell
Proceedings of the 26th International Joint Conference on Artificial …, 2017
762017
What are you optimizing for? aligning recommender systems with human values
J Stray, I Vendrov, J Nixon, S Adler, D Hadfield-Menell
arXiv preprint arXiv:2107.10939, 2021
712021
Conservative Agency via Attainable Utility Preservation
AM Turner, D Hadfield-Menell, P Tadepalli
Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 385-391, 2020
652020
Consequences of Misaligned AI
S Zhuang, D Hadfield-Menell
Advances in Neural Information Processing Systems 33, 15763-15773, 2020
652020
On the utility of model learning in hri
R Choudhury, G Swamy, D Hadfield-Menell, AD Dragan
2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI …, 2019
612019
Expressive robot motion timing
A Zhou, D Hadfield-Menell, A Nagabandi, AD Dragan
Proceedings of the 2017 ACM/IEEE international conference on human-robot …, 2017
592017
Modular task and motion planning in belief space
D Hadfield-Menell, E Groshev, R Chitnis, P Abbeel
2015 IEEE/RSJ International Conference on Intelligent Robots and Systems …, 2015
552015
Explore, establish, exploit: Red teaming language models from scratch
S Casper, J Lin, J Kwon, G Culp, D Hadfield-Menell
arXiv preprint arXiv:2306.09442, 2023
492023
The assistive multi-armed bandit
L Chan, D Hadfield-Menell, S Srinivasa, A Dragan
2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI …, 2019
482019
Spurious normativity enhances learning of compliance and enforcement behavior in artificial agents
R Köster, D Hadfield-Menell, R Everett, L Weidinger, GK Hadfield, ...
Proceedings of the National Academy of Sciences 119 (3), e2106028118, 2022
45*2022
An efficient, generalized bellman update for cooperative inverse reinforcement learning
D Malik, M Palaniappan, J Fisac, D Hadfield-Menell, S Russell, A Dragan
International Conference on Machine Learning, 3394-3402, 2018
432018
系统目前无法执行此操作,请稍后再试。
文章 1–20