Imbalanced k-means: An algorithm to cluster imbalanced-distributed data- 学术资源搜索

[PDF][PDF] Imbalanced k-means: An algorithm to cluster imbalanced-distributed data

CNS Kumar, KN Rao, A Govardhan… - International Journal of …, 2014 - academia.edu

CNS Kumar, KN Rao, A Govardhan, KS Reddy

International Journal of Engineering and Technical Research, 2014•academia.edu

Abstract

K-means is a partitional clustering technique that iswell-known and widely used for its low computational cost. However, the performance of k-means algorithm tends to beaffected by skewed data distributions, ie, imbalanced data. Theyoften produce clusters of relatively uniform sizes, even if input datahave varied a cluster size, which is called the “uniform effect.” Inthis paper, we analyze the causes of this effect and illustrate thatit probably occurs more in the k-means clustering process. As the minority class decreases in size, the “uniform effect” becomes evident. To prevent theeffect of the “uniform effect”, we revisit the well-known K-means algorithmand provide a general method to properly cluster imbalance distributed data. We present Imbalanced K-Means (IKM), a multi-purpose partitional clustering procedure that minimizes the clustering sum of squared error criterion, while imposing a hard sequentiality constraint in theclustering step. The proposed algorithm consists of a novel oversampling technique implemented by removing noisy and weak instances from both majority and minority classes and then oversampling only novel minority instances. We conduct experiments using twelve UCI datasets from various application domains using fivealgorithms for comparison on eight evaluation metrics. Experimental results show the effectiveness of the proposed clustering algorithm in clustering balanced and imbalanced data.

academia.edu

展开收起

被引用次数：7 相关文章

以上显示的是最相近的搜索结果。查看全部搜索结果