Fast and accurate k-means for large datasets
M Shindler, A Wong… - Advances in neural …, 2011 - proceedings.neurips.cc
Clustering is a popular problem with many applications. We consider the k-means problem
in the situation where the data is too large to be stored in main memory and must be
accessed sequentially, such as from a disk, and where we must use as little memory as
possible. Our algorithm is based on recent theoretical results, with significant improvements
to make it practical. Our approach greatly simpli (cid: 173) fies a recently developed
algorithm, both in design and in analysis, and eliminates large constant factors in the …
in the situation where the data is too large to be stored in main memory and must be
accessed sequentially, such as from a disk, and where we must use as little memory as
possible. Our algorithm is based on recent theoretical results, with significant improvements
to make it practical. Our approach greatly simpli (cid: 173) fies a recently developed
algorithm, both in design and in analysis, and eliminates large constant factors in the …
Fast and Accurate k-means for Large Datasets
… Algorithms for solving k-means … When done: If more than k facilities, Use normal k-means
to consolidate … – Compute a weighted representative sample of stream – Solve k-means
on sample – Based on core set paradigm …
to consolidate … – Compute a weighted representative sample of stream – Solve k-means
on sample – Based on core set paradigm …
以上显示的是最相近的搜索结果。 查看全部搜索结果