[PDF][PDF] A proposed method for minimizing mining tasks' data dimensionality

AM Idrees, WH Gomaa - International Journal of Intelligent …, 2020 - academia.edu
International Journal of Intelligent Engineering and Systems, 2020academia.edu
Knowledge discovery techniques have heavily contributed to many fields with significant
success. However, with the continuous growth of data, these techniques suffer from
bottlenecks in processing these data. One of the directions to hinder this effect is reducing
the data dimensionality which focuses on eliminating the attributes which have no significant
effect on the discovery technique accuracy. This research proposes a novel method for
reducing data dimensionality. The proposed method is based on two main pillars, the first is …
Abstract
Knowledge discovery techniques have heavily contributed to many fields with significant success. However, with the continuous growth of data, these techniques suffer from bottlenecks in processing these data. One of the directions to hinder this effect is reducing the data dimensionality which focuses on eliminating the attributes which have no significant effect on the discovery technique accuracy. This research proposes a novel method for reducing data dimensionality. The proposed method is based on two main pillars, the first is applying the adapted Saaty method for determining the attributes’ consistency with proposing further adaptation targeting more accurate accuracy determination. The second pillar is applying the clustering techniques on the consistent attributes to eliminate the least weighted attributes in each cluster which also have the least consistent measures. The result of applying the two steps is to highlight the most significant dataset attributes. The proposed method has been successfully applied on the Gastrology dataset which attributes have been reduced from 62 attributes to 31 attributes. A set of classification techniques have been applied on the dataset to prove that the dimensionality reduction has retained the classification task accuracy, the results presented that the Random Forest algorithm had an accuracy equal 95.36% when applied to the adopted dataset.
academia.edu
以上显示的是最相近的搜索结果。 查看全部搜索结果