[PDF][PDF] Structured named entities in two distinct press corpora: Contemporary broadcast news and old newspapers
This paper compares the reference annotation of structured named entities in two corpora
with different origins and properties. It addresses two questions linked to such a comparison.
On the one hand, what specific issues were raised by reusing the same annotation scheme
on a corpus that differs from the first in terms of media and that predates it by more than a
century? On the other hand, what contrasts were observed in the resulting annotations
across the two corpora?
with different origins and properties. It addresses two questions linked to such a comparison.
On the one hand, what specific issues were raised by reusing the same annotation scheme
on a corpus that differs from the first in terms of media and that predates it by more than a
century? On the other hand, what contrasts were observed in the resulting annotations
across the two corpora?
[PDF][PDF] Structured Named Entities in two distinct press corpora: Contemporary Broadcast News and Old Newspapers
S Rossetα, C Grouinα, O Galibertδ, J Kahnδ… - pdfs.semanticscholar.org
Source material (more problems in OP corpus): OCR errors that do not appear:→“touché”(
touched) instead of “Fouché”(last name) combined entities:“M. Montmerqué, ingénieur”
Language (OP corpus is more difficult): Specific languages: religious language,
abbreviations; Cultural context: geographical divisions from 1890.→ Tonkin: country (loc.
adm. nat) or region (loc. adm. reg)? Annotation difficulties: boundary delimitation more
difficult: loc. adm. nat name
touched) instead of “Fouché”(last name) combined entities:“M. Montmerqué, ingénieur”
Language (OP corpus is more difficult): Specific languages: religious language,
abbreviations; Cultural context: geographical divisions from 1890.→ Tonkin: country (loc.
adm. nat) or region (loc. adm. reg)? Annotation difficulties: boundary delimitation more
difficult: loc. adm. nat name
以上显示的是最相近的搜索结果。 查看全部搜索结果