A comprehensive survey of clustering algorithms
Data analysis is used as a common method in modern science research, which is across
communication science, computer science and biology science. Clustering, as the basic …
communication science, computer science and biology science. Clustering, as the basic …
A survey on unsupervised outlier detection in high‐dimensional numerical data
High‐dimensional data in Euclidean space pose special challenges to data mining
algorithms. These challenges are often indiscriminately subsumed under the term 'curse of …
algorithms. These challenges are often indiscriminately subsumed under the term 'curse of …
On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study
The evaluation of unsupervised outlier detection algorithms is a constant challenge in data
mining research. Little is known regarding the strengths and weaknesses of different …
mining research. Little is known regarding the strengths and weaknesses of different …
Ensembles for unsupervised outlier detection: challenges and research questions a position paper
Ensembles for unsupervised outlier detection is an emerging topic that has been neglected
for a surprisingly long time (although there are reasons why this is more difficult than …
for a surprisingly long time (although there are reasons why this is more difficult than …
Density-based clustering validation
One of the most challenging aspects of clustering is validation, which is the objective and
quantitative assessment of clustering results. A number of different relative validity criteria …
quantitative assessment of clustering results. A number of different relative validity criteria …
The (black) art of runtime evaluation: Are we comparing algorithms or implementations?
Any paper proposing a new algorithm should come with an evaluation of efficiency and
scalability (particularly when we are designing methods for “big data”). However, there are …
scalability (particularly when we are designing methods for “big data”). However, there are …
Validation of cluster analysis results on validation data: A systematic framework
Cluster analysis refers to a wide range of data analytic techniques for class discovery and is
popular in many application fields. To assess the quality of a clustering result, different …
popular in many application fields. To assess the quality of a clustering result, different …
A survey on enhanced subspace clustering
Subspace clustering finds sets of objects that are homogeneous in subspaces of high-
dimensional datasets, and has been successfully applied in many domains. In recent years …
dimensional datasets, and has been successfully applied in many domains. In recent years …
A collection of benchmark datasets for systematic evaluations of machine learning on the semantic web
In the recent years, several approaches for machine learning on the Semantic Web have
been proposed. However, no extensive comparisons between those approaches have been …
been proposed. However, no extensive comparisons between those approaches have been …
On using classification datasets to evaluate graph outlier detection: Peculiar observations and new insights
It is common practice of the outlier mining community to repurpose classification datasets
toward evaluating various detection models. To that end, often a binary classification dataset …
toward evaluating various detection models. To that end, often a binary classification dataset …