[PDF][PDF] Big data clustering techniques based on spark: a literature review

MM Saeed, Z Al Aghbari, M Alsharidah - PeerJ Computer Science, 2020 - peerj.com
A popular unsupervised learning method, known as clustering, is extensively used in data
mining, machine learning and pattern recognition. The procedure involves grouping of …

BLOCK-DBSCAN: Fast clustering for large scale data

Y Chen, L Zhou, N Bouguila, C Wang, Y Chen, J Du - Pattern Recognition, 2021 - Elsevier
We analyze the drawbacks of DBSCAN and its variants, and find the grid technique, which is
used in Fast-DBSCAN and ρ-approximate DBSCAN, is almost useless in high dimensional …

RNN-DBSCAN: A density-based clustering algorithm using reverse nearest neighbor density estimates

A Bryant, K Cios - IEEE Transactions on Knowledge and Data …, 2017 - ieeexplore.ieee.org
A new density-based clustering algorithm, RNN-DBSCAN, is presented which uses reverse
nearest neighbor counts as an estimate of observation density. Clustering is performed …

KNN-BLOCK DBSCAN: Fast clustering for large-scale data

Y Chen, L Zhou, S Pei, Z Yu, Y Chen… - IEEE transactions on …, 2019 - ieeexplore.ieee.org
Large-scale data clustering is an essential key for big data problem. However, no current
existing approach is “optimal” for big data due to high complexity, which remains it a great …

A comprehensive survey on cloud data mining (CDM) frameworks and algorithms

HB Barua, KC Mondal - ACM Computing Surveys (CSUR), 2019 - dl.acm.org
Data mining is used for finding meaningful information out of a vast expanse of data. With
the advent of Big Data concept, data mining has come to much more prominence …

An improved DBSCAN algorithm based on the neighbor similarity and fast nearest neighbor query

SS Li - Ieee Access, 2020 - ieeexplore.ieee.org
DBSCAN is the most famous density based clustering algorithm which is one of the main
clustering paradigms. However, there are many redundant distance computations among …

A novel density-based clustering algorithm using nearest neighbor graph

H Li, X Liu, T Li, R Gan - Pattern Recognition, 2020 - Elsevier
Density-based clustering has several desirable properties, such as the abilities to handle
and identify noise samples, discover clusters of arbitrary shapes, and automatically discover …

Clustering with local density peaks-based minimum spanning tree

D Cheng, Q Zhu, J Huang, Q Wu… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org
Clustering analysis has been widely used in statistics, machine learning, pattern recognition,
image processing, and so on. It is a great challenge for most existing clustering algorithms to …

Theoretically-efficient and practical parallel DBSCAN

Y Wang, Y Gu, J Shun - Proceedings of the 2020 ACM SIGMOD …, 2020 - dl.acm.org
The DBSCAN method for spatial clustering has received significant attention due to its
applicability in a variety of data analysis tasks. There are fast sequential algorithms for …

AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities

JH Kim, JH Choi, KH Yoo, A Nasridinov - The Journal of Supercomputing, 2019 - Springer
Clustering is a typical data mining technique that partitions a dataset into multiple subsets of
similar objects according to similarity metrics. In particular, density-based algorithms can find …