[PDF][PDF] Graph-based n-gram language identification on short texts

E Tromp, M Pechenizkiy - Proc. 20th Machine Learning …, 2011 - researchgate.net
E Tromp, M Pechenizkiy
Proc. 20th Machine Learning conference of Belgium and The Netherlands, 2011researchgate.net
Abstract Language identification (LI) is an important task in natural language processing.
Several machine learning approaches have been proposed for addressing this problem, but
most of them assume relatively long and well written texts. We propose a graph-based N-
gram approach for LI called LIGA which targets relatively short and ill-written texts. The
results of our experimental study show that LIGA outperforms the state-of-the-art N-gram
approach on Twitter messages LI.
Abstract
Language identification (LI) is an important task in natural language processing. Several machine learning approaches have been proposed for addressing this problem, but most of them assume relatively long and well written texts. We propose a graph-based N-gram approach for LI called LIGA which targets relatively short and ill-written texts. The results of our experimental study show that LIGA outperforms the state-of-the-art N-gram approach on Twitter messages LI.
researchgate.net
以上显示的是最相近的搜索结果。 查看全部搜索结果