Fast and accurate k-means for large datasets

K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data

AM Ikotun, AE Ezugwu, L Abualigah, B Abuhaija… - Information …, 2023 - Elsevier

Advances in recent techniques for scientific data collection in the era of big data allow for the
systematic accumulation of large quantities of data at various data-capturing sites. Similarly …

被引用次数：616 相关文章所有 3 个版本

[PDF] wiley.com Full View

A review of data fusion techniques

F Castanedo - The scientific world journal, 2013 - Wiley Online Library

The integration of data and knowledge from several sources is known as data fusion. This
paper summarizes the state of the data fusion field and describes the most relevant studies …

被引用次数：1147 相关文章所有 14 个版本

[PDF] google.com

Fast density peak clustering for large scale data based on kNN

Y Chen, X Hu, W Fan, L Shen, Z Zhang, X Liu… - Knowledge-Based …, 2020 - Elsevier

Abstract Density Peak (DPeak) clustering algorithm is not applicable for large scale data,
due to two quantities, ie, ρ and δ, are both obtained by brute force algorithm with complexity …

被引用次数：239 相关文章所有 5 个版本

Two improved k-means algorithms

SS Yu, SW Chu, CM Wang, YK Chan, TC Chang - Applied Soft Computing, 2018 - Elsevier

K-means algorithm is the most commonly used simple clustering method. For a large
number of high dimensional numerical data, it provides an efficient method for classifying …

被引用次数：233 相关文章所有 2 个版本

[HTML] amazon.science

Elastic machine learning algorithms in amazon sagemaker

E Liberty, Z Karnin, B Xiang, L Rouesnel… - Proceedings of the …, 2020 - dl.acm.org

There is a large body of research on scalable machine learning (ML). Nevertheless, training
ML models on large, continuously evolving datasets is still a difficult and costly undertaking …

被引用次数：135 相关文章所有 10 个版本

[PDF] psu.edu

Sage: Self-tuning approximation for graphics engines

M Samadi, J Lee, DA Jamshidi, A Hormati… - Proceedings of the 46th …, 2013 - dl.acm.org

Approximate computing, where computation accuracy is traded off for better performance or
higher data throughput, is one solution that can help data processing keep pace with the …

被引用次数：355 相关文章所有 14 个版本

[PDF] aaai.org

Approximate k-means++ in sublinear time

O Bachem, M Lucic, SH Hassani… - Proceedings of the AAAI …, 2016 - ojs.aaai.org

The quality of K-Means clustering is extremely sensitive to proper initialization. The classic
remedy is to apply k-means++ to obtain an initial set of centers that is provably competitive …

被引用次数：170 相关文章所有 11 个版本

[PDF] smu.edu.sg

k-means: A revisit

WL Zhao, CH Deng, CW Ngo - Neurocomputing, 2018 - Elsevier

Due to its simplicity and versatility, k-means remains popular since it was proposed three
decades ago. The performance of k-means has been enhanced from different perspectives …

被引用次数：105 相关文章所有 4 个版本

[PDF] academia.edu

An evolutionary algorithm for clustering data streams with a variable number of clusters

J de Andrade Silva, ER Hruschka, J Gama - Expert Systems with …, 2017 - Elsevier

Several algorithms for clustering data streams based on k-Means have been proposed in
the literature. However, most of them assume that the number of clusters, k, is known a priori …

被引用次数：107 相关文章所有 10 个版本

[PDF] springer.com

State-of-the-art on clustering data streams

M Ghesmoune, M Lebbah, H Azzag - Big Data Analytics, 2016 - Springer

Clustering is a key data mining task. This is the problem of partitioning a set of observations
into clusters such that the intra-cluster observations are similar and the inter-cluster …

被引用次数：95 相关文章所有 9 个版本