On the opportunities and risks of foundation models R Bommasani, DA Hudson, E Adeli, R Altman, S Arora, S von Arx, ... arXiv preprint arXiv:2108.07258, 2021 | 3341 | 2021 |
Holistic evaluation of language models P Liang, R Bommasani, T Lee, D Tsipras, D Soylu, M Yasunaga, Y Zhang, ... arXiv preprint arXiv:2211.09110, 2022 | 774 | 2022 |
FEQA: A question answering evaluation framework for faithfulness assessment in abstractive summarization E Durmus, H He, M Diab ACL, 2020 | 366 | 2020 |
Benchmarking large language models for news summarization T Zhang, F Ladhak, E Durmus, P Liang, K McKeown, TB Hashimoto Transactions of the Association for Computational Linguistics 12, 39-57, 2024 | 234 | 2024 |
Whose opinions do language models reflect? S Santurkar, E Durmus, F Ladhak, C Lee, P Liang, T Hashimoto International Conference on Machine Learning, 29971-30004, 2023 | 213 | 2023 |
WikiLingua: A new benchmark dataset for cross-lingual abstractive summarization F Ladhak, E Durmus, C Cardie, K McKeown arXiv preprint arXiv:2010.03093, 2020 | 173 | 2020 |
Easily accessible text-to-image generation amplifies demographic stereotypes at large scale F Bianchi, P Kalluri, E Durmus, F Ladhak, M Cheng, D Nozza, ... Proceedings of the 2023 ACM Conference on Fairness, Accountability, and …, 2023 | 159 | 2023 |
The gem benchmark: Natural language generation, its evaluation and metrics S Gehrmann, T Adewumi, K Aggarwal, PS Ammanamanchi, ... arXiv preprint arXiv:2102.01672, 2021 | 133 | 2021 |
Towards measuring the representation of subjective global opinions in language models E Durmus, K Nyugen, TI Liao, N Schiefer, A Askell, A Bakhtin, C Chen, ... arXiv preprint arXiv:2306.16388, 2023 | 90 | 2023 |
Evaluating human-language model interaction M Lee, M Srivastava, A Hardy, J Thickstun, E Durmus, A Paranjape, ... arXiv preprint arXiv:2212.09746, 2022 | 80 | 2022 |
Marked personas: Using natural language prompts to measure stereotypes in language models M Cheng, E Durmus, D Jurafsky arXiv preprint arXiv:2305.18189, 2023 | 78 | 2023 |
Exploring the role of prior beliefs for argument persuasion E Durmus, C Cardie NAACL, 2018 | 76 | 2018 |
Towards understanding sycophancy in language models M Sharma, M Tong, T Korbak, D Duvenaud, A Askell, SR Bowman, ... arXiv preprint arXiv:2310.13548, 2023 | 67 | 2023 |
Faithful or extractive? on mitigating the faithfulness-abstractiveness trade-off in abstractive summarization F Ladhak, E Durmus, H He, C Cardie, K McKeown arXiv preprint arXiv:2108.13684, 2021 | 62 | 2021 |
Studying large language model generalization with influence functions R Grosse, J Bae, C Anil, N Elhage, A Tamkin, A Tajdini, B Steiner, D Li, ... arXiv preprint arXiv:2308.03296, 2023 | 60 | 2023 |
Measuring faithfulness in chain-of-thought reasoning T Lanham, A Chen, A Radhakrishnan, B Steiner, C Denison, ... arXiv preprint arXiv:2307.13702, 2023 | 52 | 2023 |
Question decomposition improves the faithfulness of model-generated reasoning A Radhakrishnan, K Nguyen, A Chen, C Chen, C Denison, D Hernandez, ... arXiv preprint arXiv:2307.11768, 2023 | 43* | 2023 |
Exploring the Role of Argument Structure in Online Debate Persuasion J Li, E Durmus, C Cardie EMNLP, 2020 | 43 | 2020 |
Persuasion of the Undecided: Language vs. the Listener. L Longpre, E Durmus, C Cardie Proceedings of the 6th Workshop on Argument Mining, 2019 | 35 | 2019 |
A corpus for modeling user and language effects in argumentation on online debating E Durmus, C Cardie ACL, 2019 | 32 | 2019 |