Towards question-answering as an automatic metric for evaluating the content quality of a summary D Deutsch, T Bedrax-Weiss, D Roth Transactions of the Association for Computational Linguistics 9, 774-789, 2021 | 91 | 2021 |
A statistical analysis of summarization evaluation metrics using resampling methods D Deutsch, R Dror, D Roth Transactions of the Association for Computational Linguistics 9, 1132-1146, 2021 | 56 | 2021 |
Understanding the extent to which content quality metrics measure the information quality of summaries D Deutsch, D Roth Proceedings of the 25th Conference on Computational Natural Language …, 2021 | 35* | 2021 |
The devil is in the errors: Leveraging large language models for fine-grained machine translation evaluation P Fernandes, D Deutsch, M Finkelstein, P Riley, AFT Martins, G Neubig, ... arXiv preprint arXiv:2308.07286, 2023 | 34 | 2023 |
SacreROUGE: An open-source library for using and developing summarization evaluation metrics D Deutsch, D Roth arXiv preprint arXiv:2007.05374, 2020 | 31 | 2020 |
Re-examining system-level correlations of automatic summarization evaluation metrics D Deutsch, R Dror, D Roth arXiv preprint arXiv:2204.10216, 2022 | 30 | 2022 |
On the limitations of reference-free evaluations of generated text D Deutsch, R Dror, D Roth arXiv preprint arXiv:2210.12563, 2022 | 26 | 2022 |
Results of WMT23 metrics shared task: Metrics might be guilty but references are not innocent M Freitag, N Mathur, C Lo, E Avramidis, R Rei, B Thompson, T Kocmi, ... Proceedings of the Eighth Conference on Machine Translation, 578-628, 2023 | 25 | 2023 |
A general-purpose algorithm for constrained sequential inference D Deutsch, S Upadhyay, D Roth Proceedings of the 23rd Conference on Computational Natural Language …, 2019 | 17 | 2019 |
Gemv2: Multilingual nlg benchmarking in a single line of code S Gehrmann, A Bhattacharjee, A Mahendiran, A Wang, A Papangelis, ... arXiv preprint arXiv:2206.11249, 2022 | 15 | 2022 |
The eval4nlp 2023 shared task on prompting large language models as explainable metrics C Leiter, J Opitz, D Deutsch, Y Gao, R Dror, S Eger arXiv preprint arXiv:2310.19792, 2023 | 13 | 2023 |
MetricX-23: The Google submission to the WMT 2023 metrics shared task J Juraska, M Finkelstein, D Deutsch, A Siddhant, M Mirzazadeh, M Freitag Proceedings of the Eighth Conference on Machine Translation, 756-767, 2023 | 12 | 2023 |
A distributional and orthographic aggregation model for english derivational morphology D Deutsch, J Hewitt, D Roth Proceedings of the 56th Annual Meeting of the Association for Computational …, 2018 | 12 | 2018 |
Ties matter: Meta-evaluating modern metrics with pairwise accuracy and tie calibration D Deutsch, G Foster, M Freitag arXiv preprint arXiv:2305.14324, 2023 | 11 | 2023 |
Summary cloze: A new task for content selection in topic-focused summarization D Deutsch, D Roth Proceedings of the 2019 Conference on Empirical Methods in Natural Language …, 2019 | 11 | 2019 |
Training and meta-evaluating machine translation evaluation metrics at the paragraph level D Deutsch, J Juraska, M Finkelstein, M Freitag arXiv preprint arXiv:2308.13506, 2023 | 10 | 2023 |
Incorporating question answering-based signals into abstractive summarization via salient span selection D Deutsch, D Roth arXiv preprint arXiv:2111.07935, 2021 | 10* | 2021 |
Ties matter: Modifying kendall’s tau for modern metric meta-evaluation D Deutsch, G Foster, M Freitag arXiv preprint arXiv:2305.14324, 2023 | 7 | 2023 |
Needle in a haystack: An analysis of high-agreement workers on mturk for summarization L Zhang, S Mille, Y Hou, D Deutsch, E Clark, Y Liu, S Mahamood, ... arXiv preprint arXiv:2212.10397, 2022 | 7 | 2022 |
Is killed more significant than fled? a contextual model for salient event detection D Jindal, D Deutsch, D Roth Proceedings of the 28th International Conference on Computational …, 2020 | 7 | 2020 |