Hyperloglog in practice: Algorithmic engineering of a state of the art cardinality estimation algorithm

S Heule, M Nunkesser, A Hall - … of the 16th International Conference on …, 2013 - dl.acm.org
Cardinality estimation has a wide range of applications and is of particular importance in
database systems. Various algorithms have been proposed in the past, and the …

[图书][B] Machine learning for data streams: with practical examples in MOA

A Bifet, R Gavalda, G Holmes, B Pfahringer - 2023 - books.google.com
A hands-on approach to tasks and techniques in data stream mining and real-time analytics,
with examples in MOA, a popular freely available open-source software framework. Today …

Graph sketches: sparsification, spanners, and subgraphs

KJ Ahn, S Guha, A McGregor - Proceedings of the 31st ACM SIGMOD …, 2012 - dl.acm.org
When processing massive data sets, a core task is to construct synopses of the data. To be
useful, a synopsis data structure should be easy to construct while also yielding good …

Estimating the unseen: an n/log (n)-sample estimator for entropy and support size, shown optimal via new CLTs

G Valiant, P Valiant - Proceedings of the forty-third annual ACM …, 2011 - dl.acm.org
We introduce a new approach to characterizing the unobserved portion of a distribution,
which provides sublinear--sample estimators achieving arbitrarily small additive constant …

Sketching and sublinear data structures in genomics

G Marçais, B Solomon, R Patro… - Annual Review of …, 2019 - annualreviews.org
Large-scale genomics demands computational methods that scale sublinearly with the
growth of data. We review several data structures and sketching techniques that have been …

A framework for adversarially robust streaming algorithms

O Ben-Eliezer, R Jayaram, DP Woodruff… - ACM Journal of the ACM …, 2022 - dl.acm.org
We investigate the adversarial robustness of streaming algorithms. In this context, an
algorithm is considered robust if its performance guarantees hold even if the stream is …

Tight bounds for lp samplers, finding duplicates in streams, and related problems

H Jowhari, M Sağlam, G Tardos - … of the thirtieth ACM SIGMOD-SIGACT …, 2011 - dl.acm.org
In this paper, we present near-optimal space bounds for Lp-samplers. Given a stream of
updates (additions and subtraction) to the coordinates of an underlying vector x in Rn, a …

SpreadSketch: Toward invertible and network-wide detection of superspreaders

L Tang, Q Huang, PPC Lee - IEEE INFOCOM 2020-IEEE …, 2020 - ieeexplore.ieee.org
Superspreaders (ie, hosts with numerous distinct connections) remain severe threats to
production networks. How to accurately detect superspreaders in real-time at scale remains …

Evolving object-oriented designs with refactorings

L Tokuda, D Batory - Automated Software Engineering, 2001 - Springer
Refactorings are behavior-preserving program transformations that automate design
evolution in object-oriented applications. Three kinds of design evolution are: schema …

Mergeable summaries

PK Agarwal, G Cormode, Z Huang, JM Phillips… - ACM Transactions on …, 2013 - dl.acm.org
We study the mergeability of data summaries. Informally speaking, mergeability requires
that, given two summaries on two datasets, there is a way to merge the two summaries into a …