Comparing automatic and human evaluation of NLG systems A Belz, E Reiter 11th conference of the european chapter of the association for computational …, 2006 | 264 | 2006 |
Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models A Belz Natural Language Engineering 14 (4), 431-455, 2008 | 258 | 2008 |
An investigation into the validity of some metrics for automatically evaluating natural language generation systems E Reiter, A Belz Computational Linguistics 35 (4), 529-558, 2009 | 237 | 2009 |
Twenty Years of Confusion in Human Evaluation: NLG needs evaluation sheets and standardised definitions D Howcroft, A Belz, D Gkatzia, S Hasan, S Mahamood, S Mille, M Clinciu, ... International Natural Language Generation Conference 2020 (INLG'20), 2020 | 182 | 2020 |
The first surface realisation shared task: Overview and evaluation results A Belz, M White, D Espinosa, E Kow, D Hogan, A Stent Proceedings of the 13th European workshop on natural language generation …, 2011 | 109 | 2011 |
The TUNA-REG Challenge 2009: Overview and evaluation results A Gatt, A Belz, E Kow Association for Computational Linguistics, 2009 | 79 | 2009 |
Introducing shared tasks to NLG: The TUNA shared task evaluation challenges A Gatt, A Belz Conference of the European Association for Computational Linguistics, 264-293, 2009 | 78 | 2009 |
A Systematic Review of Reproducibility Research in Natural Language Processing A Belz, S Agarwal, A Shimorina, E Reiter EACL'21, 2021 | 71 | 2021 |
Intrinsic vs. extrinsic evaluation measures for referring expression generation A Belz, A Gatt Proceedings of ACL-08: HLT, Short Papers, 197-200, 2008 | 71 | 2008 |
The First Multilingual Surface Realisation Shared Task (SR'18): Overview and Evaluation Results S Mille, A Belz, B Bohnet, Y Graham, E Pitler, L Wanner Proceedings of the ACL'18 Workshop on Multilingual Surface Realisation …, 2018 | 68 | 2018 |
Disentangling the Properties of Human Evaluation Methods: A Classification System to Support Comparability, Meta-Evaluation and Reproducibility Testing A Belz, S Mille, D Howcroft International Natural Language Generation Conference 2020 (INLG'20), 2020 | 62 | 2020 |
The TUNA challenge 2008: Overview and evaluation results A Gatt, A Belz, E Kow Association for Computational Linguistics, 2008 | 62 | 2008 |
The attribute selection for GRE challenge: Overview and evaluation results A Belz, A Gatt Proceedings of the Workshop on Using corpora for natural language generation, 2007 | 58 | 2007 |
ITRI-02-04 PILLS: Multilingual generation of medical information documents with overlapping content N Bouayad-Agha, R Power, D Scott, A Belz Proceedings of LREC 2002, 22-31, 2002 | 51 | 2002 |
That's nice… what can you do with it? A Belz Computational Linguistics 35 (1), 2009 | 50* | 2009 |
Generating referring expressions in context: The GREC task evaluation challenges A Belz, E Kow, J Viethen, A Gatt Conference of the European Association for Computational Linguistics, 294-327, 2009 | 46 | 2009 |
The Second Multilingual Surface Realisation Shared Task (SR‘19): Overview and Evaluation Results S Mille, A Belz, B Bohnet, Y Graham, L Wanner Proceedings of the 2nd Workshop on Multilingual Surface Realisation, 2019 | 43 | 2019 |
Comparing rating scales and preference judgements in language evaluation A Belz, E Kow Proceedings of the 6th International Natural Language Generation Conference, 2010 | 40 | 2010 |
The GREC challenges 2010: overview and evaluation results A Belz, E Kow Proceedings of the 6th international natural language generation conference, 2010 | 40 | 2010 |
The ReproGen Shared Task on Reproducibility of Human Evaluations in NLG: Overview and Results A Belz, A Shimorina, S Agarwal, E Reiter Proceedings of the 14th International Natural Language Generation Conference …, 2021 | 38 | 2021 |