K-means clustering versus validation measures: a data distribution perspective

H Xiong, J Wu, J Chen - Proceedings of the 12th ACM SIGKDD …, 2006 - dl.acm.org
K-means is a widely used partitional clustering method. While there are considerable
research efforts to characterize the key features of K-means clustering, further investigation …

Cluster analysis and K-means clustering: an introduction

J Wu, J Wu - Advances in K-Means clustering: A data mining …, 2012 - Springer
The phrase “data mining” was termed in the late eighties of the last century, which describes
the activity that attempts to extract interesting patterns from data. Since then, data mining and …

An improved ant algorithm with LDA-based representation for text document clustering

A Onan, H Bulut, S Korukoglu - Journal of Information …, 2017 - journals.sagepub.com
Document clustering can be applied in document organisation and browsing, document
summarisation and classification. The identification of an appropriate representation for …

An introduction to johnson-lindenstrauss transforms

CB Freksen - arXiv preprint arXiv:2103.00564, 2021 - arxiv.org
Johnson--Lindenstrauss Transforms are powerful tools for reducing the dimensionality of
data while preserving key characteristics of that data, and they have found use in many …

Towards understanding hierarchical clustering: A data distribution perspective

J Wu, H Xiong, J Chen - Neurocomputing, 2009 - Elsevier
A very important category of clustering methods is hierarchical clustering. There are
considerable research efforts which have been focused on algorithm-level improvements of …

An online document clustering technique for short web contents

M Carullo, E Binaghi, I Gallo - Pattern Recognition Letters, 2009 - Elsevier
Document clustering techniques have been applied in several areas, with the web as one of
the most recent and influential. Both general-purpose and text-oriented techniques exist and …

Research of fast SOM clustering for text information

Y Liu, C Wu, M Liu - Expert Systems with Applications, 2011 - Elsevier
The state-of-the-art text clustering methods suffer from the huge size of documents with high-
dimensional features. In this paper, we studied fast SOM clustering technology for Text …

Combining semantic and term frequency similarities for text clustering

VHA Soares, RJGB Campello… - … and Information Systems, 2019 - Springer
A key challenge for document clustering consists in finding a proper similarity measure for
text documents that enables the generation of cohesive groups. Measures based on the …

CDIM: document clustering by discrimination information maximization

MT Hassan, A Karim, JB Kim, M Jeon - Information Sciences, 2015 - Elsevier
Ideally, document clustering methods should produce clusters that are semantically relevant
and readily understandable as collections of documents belonging to particular contexts or …

Singular Value Decomposition for dimensionality reduction in unsupervised text learning problems

TF Abidin, B Yusuf, M Umran - 2010 2nd International …, 2010 - ieeexplore.ieee.org
Partitioning vast amounts of text documents is a challenging problem due to a high
dimensional representation of the documents. In this study, we investigate the quality of text …