K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data

AM Ikotun, AE Ezugwu, L Abualigah, B Abuhaija… - Information …, 2023 - Elsevier
Advances in recent techniques for scientific data collection in the era of big data allow for the
systematic accumulation of large quantities of data at various data-capturing sites. Similarly …

A review of data fusion techniques

F Castanedo - The scientific world journal, 2013 - Wiley Online Library
The integration of data and knowledge from several sources is known as data fusion. This
paper summarizes the state of the data fusion field and describes the most relevant studies …

Fast density peak clustering for large scale data based on kNN

Y Chen, X Hu, W Fan, L Shen, Z Zhang, X Liu… - Knowledge-Based …, 2020 - Elsevier
Abstract Density Peak (DPeak) clustering algorithm is not applicable for large scale data,
due to two quantities, ie, ρ and δ, are both obtained by brute force algorithm with complexity …

Two improved k-means algorithms

SS Yu, SW Chu, CM Wang, YK Chan, TC Chang - Applied Soft Computing, 2018 - Elsevier
K-means algorithm is the most commonly used simple clustering method. For a large
number of high dimensional numerical data, it provides an efficient method for classifying …

Elastic machine learning algorithms in amazon sagemaker

E Liberty, Z Karnin, B Xiang, L Rouesnel… - Proceedings of the …, 2020 - dl.acm.org
There is a large body of research on scalable machine learning (ML). Nevertheless, training
ML models on large, continuously evolving datasets is still a difficult and costly undertaking …

Sage: Self-tuning approximation for graphics engines

M Samadi, J Lee, DA Jamshidi, A Hormati… - Proceedings of the 46th …, 2013 - dl.acm.org
Approximate computing, where computation accuracy is traded off for better performance or
higher data throughput, is one solution that can help data processing keep pace with the …

Approximate k-means++ in sublinear time

O Bachem, M Lucic, SH Hassani… - Proceedings of the AAAI …, 2016 - ojs.aaai.org
The quality of K-Means clustering is extremely sensitive to proper initialization. The classic
remedy is to apply k-means++ to obtain an initial set of centers that is provably competitive …

k-means: A revisit

WL Zhao, CH Deng, CW Ngo - Neurocomputing, 2018 - Elsevier
Due to its simplicity and versatility, k-means remains popular since it was proposed three
decades ago. The performance of k-means has been enhanced from different perspectives …

An evolutionary algorithm for clustering data streams with a variable number of clusters

J de Andrade Silva, ER Hruschka, J Gama - Expert Systems with …, 2017 - Elsevier
Several algorithms for clustering data streams based on k-Means have been proposed in
the literature. However, most of them assume that the number of clusters, k, is known a priori …

State-of-the-art on clustering data streams

M Ghesmoune, M Lebbah, H Azzag - Big Data Analytics, 2016 - Springer
Clustering is a key data mining task. This is the problem of partitioning a set of observations
into clusters such that the intra-cluster observations are similar and the inter-cluster …