Explainable k-means and k-medians clustering

M Moshkovitz, S Dasgupta… - … on machine learning, 2020 - proceedings.mlr.press
Many clustering algorithms lead to cluster assignments that are hard to explain, partially
because they depend on all the features of the data in a complicated way. To improve …

Turning Big Data Into Tiny Data: Constant-Size Coresets for -Means, PCA, and Projective Clustering

D Feldman, M Schmidt, C Sohler - SIAM Journal on Computing, 2020 - SIAM
We develop and analyze a method to reduce the size of a very large set of data points in a
high-dimensional Euclidean space R^d to a small set of weighted points such that the result …

An effective and efficient algorithm for K-means clustering with new formulation

F Nie, Z Li, R Wang, X Li - IEEE Transactions on Knowledge …, 2022 - ieeexplore.ieee.org
K-means is one of the most simple and popular clustering algorithms, which implemented as
a standard clustering method in most of machine learning researches. The goal of K-means …

Remaining discharge energy estimation for lithium-ion batteries based on future load prediction considering temperature and ageing effects

X Lai, Y Huang, H Gu, X Han, X Feng, H Dai, Y Zheng… - Energy, 2022 - Elsevier
The estimation of remaining discharge energy (RDE) of lithium-ion batteries is the basis for
the remaining driving range estimation of electric vehicles. The RDE estimation is affected …

Streamkm++ a clustering algorithm for data streams

MR Ackermann, M Märtens, C Raupach… - Journal of Experimental …, 2012 - dl.acm.org
We develop a new k-means clustering algorithm for data streams of points from a Euclidean
space. We call this algorithm StreamKM++. Our algorithm computes a small weighted …

The effectiveness of Lloyd-type methods for the k-means problem

R Ostrovsky, Y Rabani, LJ Schulman… - Journal of the ACM …, 2013 - dl.acm.org
We investigate variants of Lloyd's heuristic for clustering high-dimensional data in an attempt
to explain its popularity (a half century after its introduction) among practitioners, and in …

Fast and provably good seedings for k-means

O Bachem, M Lucic, H Hassani… - Advances in neural …, 2016 - proceedings.neurips.cc
Seeding-the task of finding initial cluster centers-is critical in obtaining high-quality
clusterings for k-Means. However, k-means++ seeding, the state of the art algorithm, does …

Statistical and computational guarantees of lloyd's algorithm and its variants

Y Lu, HH Zhou - arXiv preprint arXiv:1612.02099, 2016 - arxiv.org
Clustering is a fundamental problem in statistics and machine learning. Lloyd's algorithm,
proposed in 1957, is still possibly the most widely used clustering algorithm in practice due …

Core-sets: Updated survey

D Feldman - Sampling techniques for supervised or unsupervised …, 2020 - Springer
In optimization or machine learning problems we are given a set of items, usually points in
some metric space, and the goal is to minimize or maximize an objective function over some …

Approximate k-means++ in sublinear time

O Bachem, M Lucic, SH Hassani… - Proceedings of the AAAI …, 2016 - ojs.aaai.org
The quality of K-Means clustering is extremely sensitive to proper initialization. The classic
remedy is to apply k-means++ to obtain an initial set of centers that is provably competitive …