关注
Alex Tamkin
Alex Tamkin
Research Scientist, Anthropic
在 cs.stanford.edu 的电子邮件经过验证 - 首页
标题
引用次数
引用次数
年份
On the opportunities and risks of foundation models
R Bommasani, DA Hudson, E Adeli, R Altman, S Arora, S von Arx, ...
arXiv preprint arXiv:2108.07258, 2021
34962021
Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models
A Tamkin, M Brundage, J Clark, D Ganguli
arXiv preprint arXiv:2102.02503, https://arxiv.org/abs/2102.02503, 2021
2722021
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
T Bricken, A Templeton, J Batson, B Chen, A Jermyn, T Conerly, ...
https://transformer-circuits.pub/2023/monosemantic-features/index.html, 2023
1232023
Towards measuring the representation of subjective global opinions in language models
E Durmus, K Nguyen, TI Liao, N Schiefer, A Askell, A Bakhtin, C Chen, ...
arXiv preprint arXiv:2306.16388, 2023
1012023
Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy
R Keramati, C Dann, A Tamkin, E Brunskill
Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), 2020
872020
Studying large language model generalization with influence functions
R Grosse, J Bae, C Anil, N Elhage, A Tamkin, A Tajdini, B Steiner, D Li, ...
arXiv preprint arXiv:2308.03296, 2023
702023
Viewmaker Networks: Learning Views for Unsupervised Representation Learning
A Tamkin, M Wu, N Goodman
ICLR 2021, 2020
682020
Drone.io: A Gestural and Visual Interface for Human-Drone Interaction
JR Cauchard, A Tamkin, CY Wang, L Vink, M Park, T Fang, JA Landay
2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI …, 2019
572019
Investigating transferability in pretrained language models
A Tamkin, T Singh, D Giovanardi, N Goodman
Findings of EMNLP 2020, 2020
442020
Scaling monosemanticity: Extracting interpretable features from claude 3 sonnet
A Templeton, T Conerly, J Marcus, J Lindsey, T Bricken, B Chen, ...
Transformer Circuits Thread, 2024
412024
Language Through a Prism: A Spectral Approach for Multiscale Language Representations
A Tamkin, D Jurafsky, N Goodman
NeurIPS 2020, 2020
382020
Active Learning Helps Pretrained Models Learn the Intended Task
A Tamkin, D Nguyen, S Deshpande, J Mu, N Goodman
NeurIPS 2022, 2022
362022
DABS: A Domain-Agnostic Benchmark for Self-Supervised Learning
A Tamkin, V Liu, R Lu, D Fein, C Schultz, N Goodman
NeurIPS 2021, 2021
362021
Distributionally-Aware Exploration for CVaR Bandits
A Tamkin, R Keramati, C Dann, E Brunskill
NeurIPS 2019 Workshop on Safety and Robustness in Decision Making, 2019
362019
Recursive Routing Networks: Learning to Compose Modules for Language Understanding
I Cases, C Rosenbaum, M Riemer, A Geiger, T Klinger, A Tamkin, O Li, ...
NAACL 2019, 2019
292019
Evaluating and mitigating discrimination in language model decisions
A Tamkin, A Askell, L Lovitt, E Durmus, N Joseph, S Kravec, K Nguyen, ...
arXiv preprint arXiv:2312.03689, 2023
272023
C5t5: Controllable generation of organic molecules with transformers
D Rothchild, A Tamkin, J Yu, U Misra, J Gonzalez
arXiv preprint arXiv:2108.10307, 2021
262021
Eliciting human preferences with language models
BZ Li, A Tamkin, N Goodman, J Andreas
arXiv preprint arXiv:2310.11589, 2023
242023
Task Ambiguity in Humans and Language Models
A Tamkin, K Handa, A Shrestha, N Goodman
ICLR 2023, 2023
232023
Many-shot jailbreaking
C Anil, E Durmus, M Sharma, J Benton, S Kundu, J Batson, N Rimsky, ...
Anthropic, April, 2024
182024
系统目前无法执行此操作,请稍后再试。
文章 1–20