On synopses for distinct-value estimation under multiset operations

Y Wang, X Li, X Li, Y Wang - Knowledge and information systems, 2013 - Springer

Uncertain data have already widely existed in many practical applications recently, such as
sensor networks, RFID networks, location-based services, and mobile object management …

被引用次数：136 相关文章所有 10 个版本

[PDF] nowpublishers.com

Synopses for massive data: Samples, histograms, wavelets, sketches

G Cormode, M Garofalakis, PJ Haas… - … and Trends® in …, 2011 - nowpublishers.com

Abstract Methods for Approximate Query Processing (AQP) are essential for dealing with
massive data. They are often the only means of providing interactive response times when …

被引用次数：685 相关文章所有 16 个版本

[PDF] acm.org

Estimating the unseen: an n/log (n)-sample estimator for entropy and support size, shown optimal via new CLTs

G Valiant, P Valiant - Proceedings of the forty-third annual ACM …, 2011 - dl.acm.org

We introduce a new approach to characterizing the unobserved portion of a distribution,
which provides sublinear--sample estimators achieving arbitrarily small additive constant …

被引用次数：358 相关文章所有 8 个版本

[PDF] harvard.edu

An optimal algorithm for the distinct elements problem

DM Kane, J Nelson, DP Woodruff - Proceedings of the twenty-ninth ACM …, 2010 - dl.acm.org

We give the first optimal algorithm for estimating the number of distinct elements in a data
stream, closing a long line of theoretical research on this problem begun by Flajolet and …

被引用次数：409 相关文章所有 18 个版本

[PDF] springer.com

When the levee breaks: a practical guide to sketching algorithms for processing the flood of genomic data

WPM Rowe - Genome biology, 2019 - Springer

Considerable advances in genomics over the past decade have resulted in vast amounts of
data being generated and deposited in global archives. The growth of these archives …

被引用次数：43 相关文章所有 12 个版本

[PDF] microsoft.com

Quickr: Lazily approximating complex adhoc queries in bigdata clusters

S Kandula, A Shanbhag, A Vitorovic, M Olma… - Proceedings of the …, 2016 - dl.acm.org

We present a system that approximates the answer to complex ad-hoc queries in big-data
clusters by injecting samplers on-the-fly and without requiring pre-existing samples …

被引用次数：178 相关文章所有 15 个版本

[PDF] github.io

A sample-and-clean framework for fast and accurate query processing on dirty data

J Wang, S Krishnan, MJ Franklin, K Goldberg… - Proceedings of the …, 2014 - dl.acm.org

In emerging Big Data scenarios, obtaining timely, high-quality answers to aggregate queries
is difficult due to the challenges of processing and cleaning large, dirty data sets. To …

被引用次数：174 相关文章所有 16 个版本

[PDF] siam.org

Centralities in large networks: Algorithms and observations

U Kang, S Papadimitriou, J Sun, H Tong - Proceedings of the 2011 SIAM …, 2011 - SIAM

Node centrality measures are important in a large number of graph applications, from search
and ranking to social and biological network analysis. In this paper we study node centrality …

被引用次数：202 相关文章所有 15 个版本

[PDF] vldb.org

Cardinality estimation: An experimental survey

H Harmouch, F Naumann - Proceedings of the VLDB Endowment, 2017 - dl.acm.org

Data preparation and data profiling comprise many both basic and complex tasks to analyze
a dataset at hand and extract metadata, such as data distributions, key candidates, and …

被引用次数：119 相关文章所有 9 个版本

[PDF] stanford.edu

The power of linear estimators

G Valiant, P Valiant - 2011 IEEE 52nd Annual Symposium on …, 2011 - ieeexplore.ieee.org

For a broad class of practically relevant distribution properties, which includes entropy and
support size, nearly all of the proposed estimators have an especially simple form. Given a …

被引用次数：176 相关文章所有 11 个版本