[PDF][PDF] Structured named entities in two distinct press corpora: Contemporary broadcast news and old newspapers

S Rosset, C Grouin, K Fort, O Galibert… - Proceedings of the …, 2012 - aclanthology.org
This paper compares the reference annotation of structured named entities in two corpora
with different origins and properties. It addresses two questions linked to such a comparison.
On the one hand, what specific issues were raised by reusing the same annotation scheme
on a corpus that differs from the first in terms of media and that predates it by more than a
century? On the other hand, what contrasts were observed in the resulting annotations
across the two corpora?

[PDF][PDF] Structured Named Entities in two distinct press corpora: Contemporary Broadcast News and Old Newspapers

S Rossetα, C Grouinα, O Galibertδ, J Kahnδ… - pdfs.semanticscholar.org
Source material (more problems in OP corpus): OCR errors that do not appear:→“touché”(
touched) instead of “Fouché”(last name) combined entities:“M. Montmerqué, ingénieur”
Language (OP corpus is more difficult): Specific languages: religious language,
abbreviations; Cultural context: geographical divisions from 1890.→ Tonkin: country (loc.
adm. nat) or region (loc. adm. reg)? Annotation difficulties: boundary delimitation more
difficult: loc. adm. nat name
以上显示的是最相近的搜索结果。 查看全部搜索结果