Hyperloglog in practice: Algorithmic engineering of a state of the art cardinality estimation algorithm
S Heule, M Nunkesser, A Hall - … of the 16th International Conference on …, 2013 - dl.acm.org
Cardinality estimation has a wide range of applications and is of particular importance in
database systems. Various algorithms have been proposed in the past, and the …
database systems. Various algorithms have been proposed in the past, and the …
[图书][B] Machine learning for data streams: with practical examples in MOA
A hands-on approach to tasks and techniques in data stream mining and real-time analytics,
with examples in MOA, a popular freely available open-source software framework. Today …
with examples in MOA, a popular freely available open-source software framework. Today …
Graph sketches: sparsification, spanners, and subgraphs
KJ Ahn, S Guha, A McGregor - Proceedings of the 31st ACM SIGMOD …, 2012 - dl.acm.org
When processing massive data sets, a core task is to construct synopses of the data. To be
useful, a synopsis data structure should be easy to construct while also yielding good …
useful, a synopsis data structure should be easy to construct while also yielding good …
Estimating the unseen: an n/log (n)-sample estimator for entropy and support size, shown optimal via new CLTs
We introduce a new approach to characterizing the unobserved portion of a distribution,
which provides sublinear--sample estimators achieving arbitrarily small additive constant …
which provides sublinear--sample estimators achieving arbitrarily small additive constant …
Sketching and sublinear data structures in genomics
Large-scale genomics demands computational methods that scale sublinearly with the
growth of data. We review several data structures and sketching techniques that have been …
growth of data. We review several data structures and sketching techniques that have been …
A framework for adversarially robust streaming algorithms
We investigate the adversarial robustness of streaming algorithms. In this context, an
algorithm is considered robust if its performance guarantees hold even if the stream is …
algorithm is considered robust if its performance guarantees hold even if the stream is …
Tight bounds for lp samplers, finding duplicates in streams, and related problems
In this paper, we present near-optimal space bounds for Lp-samplers. Given a stream of
updates (additions and subtraction) to the coordinates of an underlying vector x in Rn, a …
updates (additions and subtraction) to the coordinates of an underlying vector x in Rn, a …
SpreadSketch: Toward invertible and network-wide detection of superspreaders
Superspreaders (ie, hosts with numerous distinct connections) remain severe threats to
production networks. How to accurately detect superspreaders in real-time at scale remains …
production networks. How to accurately detect superspreaders in real-time at scale remains …
Evolving object-oriented designs with refactorings
L Tokuda, D Batory - Automated Software Engineering, 2001 - Springer
Refactorings are behavior-preserving program transformations that automate design
evolution in object-oriented applications. Three kinds of design evolution are: schema …
evolution in object-oriented applications. Three kinds of design evolution are: schema …
Mergeable summaries
We study the mergeability of data summaries. Informally speaking, mergeability requires
that, given two summaries on two datasets, there is a way to merge the two summaries into a …
that, given two summaries on two datasets, there is a way to merge the two summaries into a …