Undersampled -means approach for handling imbalanced distributed data

NS Kumar, KN Rao, A Govardhan, KS Reddy… - Progress in Artificial …, 2014 - Springer
Progress in Artificial Intelligence, 2014Springer
K K-means is a partitional clustering technique that is well known and widely used for its low
computational cost. However, the performance of K K-means algorithm tends to be affected
by skewed data distributions, ie, imbalanced data. They often produce clusters of relatively
uniform sizes, even if input data have varied cluster size, which is called the “uniform effect”.
In this paper, we analyze the causes of this effect and illustrate that it probably occurs more
in the K K-means clustering process. As the minority class decreases in size, the “uniform …
Abstract
-means is a partitional clustering technique that is well known and widely used for its low computational cost. However, the performance of -means algorithm tends to be affected by skewed data distributions, i.e., imbalanced data. They often produce clusters of relatively uniform sizes, even if input data have varied cluster size, which is called the “uniform effect”. In this paper, we analyze the causes of this effect and illustrate that it probably occurs more in the -means clustering process. As the minority class decreases in size, the “uniform effect” becomes evident. To prevent the effect of the “uniform effect”, we revisit the well-known -means algorithm and provide a general method to properly cluster imbalance distributed data. The proposed algorithm consists of a novel undersampling technique implemented by intelligently removing noisy and weak instances from majority class. We conduct experiments using twelve UCI datasets from various application domains using five algorithms for comparison on eight evaluation metrics. Experimental results show the effectiveness of the proposed clustering algorithm in clustering balanced and imbalanced data.
Springer
以上显示的是最相近的搜索结果。 查看全部搜索结果