Spectral methods for data science: A statistical perspective
Spectral methods have emerged as a simple yet surprisingly effective approach for
extracting information from massive, noisy and incomplete data. In a nutshell, spectral …
extracting information from massive, noisy and incomplete data. In a nutshell, spectral …
Heterogeneity for the win: One-shot federated clustering
In this work, we explore the unique challenges—and opportunities—of unsupervised
federated learning (FL). We develop and analyze a one-shot federated clustering scheme …
federated learning (FL). We develop and analyze a one-shot federated clustering scheme …
Robust federated learning in a heterogeneous environment
We study a recently proposed large-scale distributed learning paradigm, namely Federated
Learning, where the worker machines are end users' own devices. Statistical and …
Learning, where the worker machines are end users' own devices. Statistical and …
Consistency of spectral clustering in stochastic block models
We analyze the performance of spectral clustering for community extraction in stochastic
block models. We show that, under mild conditions, spectral clustering applied to the …
block models. We show that, under mild conditions, spectral clustering applied to the …
Learning from untrusted data
The vast majority of theoretical results in machine learning and statistics assume that the
training data is a reliable reflection of the phenomena to be learned. Similarly, most learning …
training data is a reliable reflection of the phenomena to be learned. Similarly, most learning …
Hierarchical clustering: Objective functions and algorithms
Hierarchical clustering is a recursive partitioning of a dataset into clusters at an increasingly
finer granularity. Motivated by the fact that most work on hierarchical clustering was based …
finer granularity. Motivated by the fact that most work on hierarchical clustering was based …
Mixture models, robustness, and sum of squares proofs
SB Hopkins, J Li - Proceedings of the 50th Annual ACM SIGACT …, 2018 - dl.acm.org
We use the Sum of Squares method to develop new efficient algorithms for learning well-
separated mixtures of Gaussians and robust mean estimation, both in high dimensions, that …
separated mixtures of Gaussians and robust mean estimation, both in high dimensions, that …
Socially fair k-means clustering
We show that the popular k-means clustering algorithm (Lloyd's heuristic), used for a variety
of scientific data, can result in outcomes that are unfavorable to subgroups of data (eg …
of scientific data, can result in outcomes that are unfavorable to subgroups of data (eg …
Robust moment estimation and improved clustering via sum of squares
We develop efficient algorithms for estimating low-degree moments of unknown distributions
in the presence of adversarial outliers and design a new family of convex relaxations for k …
in the presence of adversarial outliers and design a new family of convex relaxations for k …
Structured federated learning through clustered additive modeling
Heterogeneous federated learning without assuming any structure is challenging due to the
conflicts among non-identical data distributions of clients. In practice, clients often comprise …
conflicts among non-identical data distributions of clients. In practice, clients often comprise …