CORD-19: The COVID-19 open research dataset LL Wang, K Lo, Y Chandrasekhar, R Reas, J Yang, D Eide, K Funk, ... Workshop on NLP for COVID-19, 2020 | 958* | 2020 |
How language model hallucinations can snowball M Zhang, O Press, W Merrill, A Liu, NA Smith arXiv preprint arXiv:2305.13534, 2023 | 163 | 2023 |
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension S Subramanian, W Merrill, T Darrell, M Gardner, S Singh, A Rohrbach Empirical Methods in Natural Language Processing, 2022 | 85 | 2022 |
Competency problems: On finding and removing artifacts in language data M Gardner, W Merrill, J Dodge, ME Peters, A Ross, S Singh, N Smith Empirical Methods in Natural Language Processing, 2021 | 83 | 2021 |
A formal hierarchy of RNN architectures W Merrill, G Weiss, Y Goldberg, R Schwartz, NA Smith, E Yahav Association of Computational Linguistics, 2020 | 77 | 2020 |
Saturated transformers are constant-depth threshold circuits W Merrill, A Sabharwal, NA Smith Transactions of the Association for Computational Linguistics 10, 843-856, 2022 | 72 | 2022 |
Sequential neural networks as automata W Merrill Deep Learning and Formal Languages (ACL workshop), 2019 | 70 | 2019 |
Provable limitations of acquiring meaning from ungrounded form: What will future language models understand? W Merrill, Y Goldberg, R Schwartz, NA Smith Transactions of the Association for Computational Linguistics 9, 1047-1060, 2021 | 64 | 2021 |
Context-free transductions with neural stacks Y Hao, W Merrill, D Angluin, R Frank, N Amsel, A Benz, S Mendelsohn BlackboxNLP, 2018 | 41 | 2018 |
The Parallelism Tradeoff: Limitations of Log-Precision Transformers W Merrill, A Sabharwal TACL, 2022 | 39 | 2022 |
Olmo: Accelerating the science of language models D Groeneveld, I Beltagy, P Walsh, A Bhagia, R Kinney, O Tafjord, AH Jha, ... arXiv preprint arXiv:2402.00838, 2024 | 29 | 2024 |
The Expressive Power of Transformers with Chain of Thought W Merrill, A Sabharwal ICLR 2024, 2023 | 29 | 2023 |
Effects of parameter norm growth during transformer training: Inductive bias from gradient descent W Merrill, V Ramanujan, Y Goldberg, R Schwartz, N Smith Empirical Methods in Natural Language Processing, 2021 | 28 | 2021 |
A tale of two circuits: Grokking as competition of sparse and dense subnetworks W Merrill, N Tsilivis, A Shukla arXiv preprint arXiv:2303.11873, 2023 | 26 | 2023 |
A Logic for Expressing Log-Precision Transformers W Merrill, A Sabharwal NeurIPS 2023, 2022 | 19* | 2022 |
What formal languages can transformers express? a survey L Strobl, W Merrill, G Weiss, D Chiang, D Angluin Transactions of the Association for Computational Linguistics 12, 543-561, 2024 | 18* | 2024 |
End-to-end graph-based TAG parsing with neural networks J Kasai, R Frank, P Xu, W Merrill, O Rambow NAACL, 2018 | 15 | 2018 |
Entailment Semantics Can Be Extracted from an Ideal Language Model W Merrill, A Warstadt, T Linzen CoNLL 2022, 2022 | 13 | 2022 |
Formal language theory meets modern NLP W Merrill arXiv preprint arXiv:2102.10094, 2021 | 13 | 2021 |
On the linguistic capacity of real-time counter automata W Merrill arXiv preprint arXiv:2004.06866, 2020 | 13 | 2020 |