关注
Abhay Sheshadri
Abhay Sheshadri
在 gatech.edu 的电子邮件经过验证
标题
引用次数
引用次数
年份
Eliciting Language Model Behaviors using Reverse Language Models
J Pfau, A Infanger, A Sheshadri, A Panda, J Michael, C Huebner
NeurIPS SOLAR Workshop, 2023
62023
A mechanistic analysis of a transformer trained on a symbolic multi-step reasoning task
J Brinkmann, A Sheshadri, V Levoso, P Swoboda, C Bartelt
arXiv preprint arXiv:2402.11917, 2024
52024
Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs
A Sheshadri, A Ewart, P Guo, A Lynch, C Wu, V Hebbar, H Sleight, ...
arXiv preprint arXiv:2407.15549, 2024
12024
Robust Unlearning via Mechanistic Localizations
PH Guo, A Syed, A Sheshadri, A Ewart, GK Dziugaite
ICML 2024 Workshop on Mechanistic Interpretability, 2024
2024
Robust Knowledge Unlearning via Mechanistic Localizations
PH Guo, A Syed, A Sheshadri, A Ewart, GK Dziugaite
ICML 2024 Next Generation of AI Safety Workshop, 0
Backward Chaining Circuits in a Transformer Trained on a Symbolic Reasoning Task
J Brinkmann, A Sheshadri, V Levoso, P Swoboda, C Bartelt
ICLR 2024 Workshop on Mathematical and Empirical Understanding of Foundation …, 0
系统目前无法执行此操作,请稍后再试。
文章 1–6