作者
Eddy Muntina Dharma, F Lumban Gaol, HLHS Warnars, BENFANO Soewito
发表日期
2022/1/31
期刊
J Theor Appl Inf Technol
卷号
100
期号
2
页码范围
31
简介
Feature extraction in the field of Text Processing or Natural Language Processing (NLP) has its own challenges due to the characteristics of unstructured text. Thus, the selection of the right feature extraction method can affect the performance of the classification. This study aims to compare the accuracy of 3 word embedding methods namely Word2Vec, GloVe and FastText on text classification using Convolutional Neural Network algorithm. These three methods were chosen because they are able to capture semantic, syntactic, sequences and even context around words. Therefore, the accuracy of these three methods was compared on the classification of news from the data set taken from the UCI KDD Archive, which contains 19,977 news stories and is grouped into 20 news topics. The results show that the word embedding with the Fast Text method performs the best accuracy in the classification process. In fact, the difference in accuracy of the three methods is not crucially significant, so, it can be concluded that its usage depends on the applied data set.
引用总数