[PDF][PDF] A rule based approach to word lemmatization

J Plisson, N Lavrac, D Mladenic - Proceedings of IS, 2004 - Citeseer
J Plisson, N Lavrac, D Mladenic
Proceedings of IS, 2004Citeseer
Lemmatization is the process of finding the normalized form of a word. It is the same as
looking for a transformation to apply on a word to get its normalized form. The approach
presented in this paper focuses on word endings: what word suffix should be removed
and/or added to get the normalized form. This paper compares the results of two word
lemmatization algorithms, one based on if-then rules and the other based on ripple down
rules induction algorithms. It presents the problem of lemmatization of words from Slovene …
Abstract
Lemmatization is the process of finding the normalized form of a word. It is the same as looking for a transformation to apply on a word to get its normalized form. The approach presented in this paper focuses on word endings: what word suffix should be removed and/or added to get the normalized form. This paper compares the results of two word lemmatization algorithms, one based on if-then rules and the other based on ripple down rules induction algorithms. It presents the problem of lemmatization of words from Slovene free text and explains why the Ripple Down Rules (RDR) approach is very well suited for the task. When learning from a corpus of lemmatized Slovene words the RDR approach results in easy to understand rules of improved classification accuracy compared to the results of rule learning achieved in previous work.
Citeseer
以上显示的是最相近的搜索结果。 查看全部搜索结果