Parallel hierarchical subspace clustering of categorical data

N Pang, J Zhang, C Zhang, X Qin - IEEE transactions on …, 2018 - ieeexplore.ieee.org
Parallel clustering is an important research area of big data analysis. The conventional
Hierarchical Agglomerative Clustering (HAC) techniques are inadequate to handle big-scale …

Parallel and efficient hierarchical k-median clustering

V Cohen-Addad, S Lattanzi… - Advances in …, 2021 - proceedings.neurips.cc
As a fundamental unsupervised learning task, hierarchical clustering has been extensively
studied in the past decade. In particular, standard metric formulations as hierarchical $ k …

A framework for parallelizing hierarchical clustering methods

S Lattanzi, T Lavastida, K Lu, B Moseley - Machine Learning and …, 2020 - Springer
Hierarchical clustering is a fundamental tool in data mining, machine learning and statistics.
Popular hierarchical clustering algorithms include top-down divisive approaches such as …

DHC: A distributed hierarchical clustering algorithm for large datasets

W Zhang, G Zhang, X Chen, Y Liu, X Zhou… - Journal of Circuits …, 2019 - World Scientific
Hierarchical clustering is a classical method to provide a hierarchical representation for the
purpose of data analysis. However, in practical applications, it is difficult to deal with massive …

A fast, scalable SLINK algorithm for commodity cluster computing exploiting spatial locality

P Goyal, S Kumari, S Sharma, D Kumar… - 2016 IEEE 18th …, 2016 - ieeexplore.ieee.org
Single linkage (SLINK) hierarchical clustering algorithm is a preferred clustering algorithm
over traditional partitioning-based clustering as it does not require the number of clusters as …

基于Spark 的BIRCH 算法并行化的设计与实现

李帅, 吴斌, 杜修明, 陈玉峰 - 计算机工程与科学, 2017 - joces.nudt.edu.cn
在分布式计算和内存为王的时代, Spark 作为基于内存计算的分布式框架技术得到了前所未有的
关注与应用. 着重研究BIRCH 算法在Spark 上并行化的设计和实现, 经过理论性能分析得到并行 …

Scaling average-linkage via sparse cluster embeddings

T Lavastida, K Lu, B Moseley… - Asian Conference on …, 2021 - proceedings.mlr.press
Average-linkage is one of the most popular hierarchical clustering algorithms. It is well
known that average-linkage does not scale to large data sets due to the slow asymptotic …

Incremental entity resolution process over query results for data integration systems

PKM Vieira, BF Lóscio, AC Salgado - Journal of Intelligent Information …, 2019 - Springer
Entity Resolution (ER) in data integration systems is the problem of identifying groups of
tuples from one or multiple data sources that represent the same real-world entity. This is a …

Single‐linkage clustering of dynamic data

A Mongandampulath Akathoott… - … Practice and Experience, 2023 - Wiley Online Library
The surge in data sizes in fluid processing applications necessitates partitioning the data
into clusters and studying their representatives instead of studying each voxel data point. In …

Parallel SLINK for big data

P Goyal, S Kumari, S Sharma… - International Journal of …, 2020 - Springer
The major strength of hierarchical clustering algorithms is that it allows visual interpretations
of clusters through dendrograms. Users can cut the dendrogram at different levels to get …