关注
Danny Halawi
Danny Halawi
在 berkeley.edu 的电子邮件经过验证
标题
引用次数
引用次数
年份
Eliciting latent predictions from transformers with the tuned lens
N Belrose, Z Furman, L Smith, D Halawi, I Ostrovsky, L McKinney, ...
arXiv preprint arXiv:2303.08112, 2023
892023
Overthinking the truth: Understanding how language models process false demonstrations
D Halawi, JS Denain, J Steinhardt
ICLR 2024, 2023
232023
Approaching Human-Level Forecasting with Language Models
D Halawi, F Zhang, C Yueh-Han, J Steinhardt
arXiv preprint arXiv:2402.18563, 2024
82024
Verifying source citations in the hadith literature
M Syed, D Halawi, B Sadeghi, N Saquib
Journal of Medieval Worlds 1 (3), 5-20, 2019
42019
Trophic analysis of a historical network reveals temporal information
C Shuaib, M Syed, D Halawi, N Saquib
Applied Network Science 7 (1), 31, 2022
32022
Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
D Halawi, A Wei, E Wallace, TT Wang, N Haghtalab, J Steinhardt
ICML 2024, 2024
2024
Dominion: A New Frontier for AI Research
D Halawi, A Sarmasi, S Saltzen, J McCoy
CoRL 2022: Workshop on Strategic Multi-Agent Interactions, 2022
2022
系统目前无法执行此操作,请稍后再试。
文章 1–7