A survey of queries over uncertain data

Y Wang, X Li, X Li, Y Wang - Knowledge and information systems, 2013 - Springer
Uncertain data have already widely existed in many practical applications recently, such as
sensor networks, RFID networks, location-based services, and mobile object management …

Synopses for massive data: Samples, histograms, wavelets, sketches

G Cormode, M Garofalakis, PJ Haas… - … and Trends® in …, 2011 - nowpublishers.com
Abstract Methods for Approximate Query Processing (AQP) are essential for dealing with
massive data. They are often the only means of providing interactive response times when …

Estimating the unseen: an n/log (n)-sample estimator for entropy and support size, shown optimal via new CLTs

G Valiant, P Valiant - Proceedings of the forty-third annual ACM …, 2011 - dl.acm.org
We introduce a new approach to characterizing the unobserved portion of a distribution,
which provides sublinear--sample estimators achieving arbitrarily small additive constant …

An optimal algorithm for the distinct elements problem

DM Kane, J Nelson, DP Woodruff - Proceedings of the twenty-ninth ACM …, 2010 - dl.acm.org
We give the first optimal algorithm for estimating the number of distinct elements in a data
stream, closing a long line of theoretical research on this problem begun by Flajolet and …

When the levee breaks: a practical guide to sketching algorithms for processing the flood of genomic data

WPM Rowe - Genome biology, 2019 - Springer
Considerable advances in genomics over the past decade have resulted in vast amounts of
data being generated and deposited in global archives. The growth of these archives …

Quickr: Lazily approximating complex adhoc queries in bigdata clusters

S Kandula, A Shanbhag, A Vitorovic, M Olma… - Proceedings of the …, 2016 - dl.acm.org
We present a system that approximates the answer to complex ad-hoc queries in big-data
clusters by injecting samplers on-the-fly and without requiring pre-existing samples …

A sample-and-clean framework for fast and accurate query processing on dirty data

J Wang, S Krishnan, MJ Franklin, K Goldberg… - Proceedings of the …, 2014 - dl.acm.org
In emerging Big Data scenarios, obtaining timely, high-quality answers to aggregate queries
is difficult due to the challenges of processing and cleaning large, dirty data sets. To …

Centralities in large networks: Algorithms and observations

U Kang, S Papadimitriou, J Sun, H Tong - Proceedings of the 2011 SIAM …, 2011 - SIAM
Node centrality measures are important in a large number of graph applications, from search
and ranking to social and biological network analysis. In this paper we study node centrality …

Cardinality estimation: An experimental survey

H Harmouch, F Naumann - Proceedings of the VLDB Endowment, 2017 - dl.acm.org
Data preparation and data profiling comprise many both basic and complex tasks to analyze
a dataset at hand and extract metadata, such as data distributions, key candidates, and …

The power of linear estimators

G Valiant, P Valiant - 2011 IEEE 52nd Annual Symposium on …, 2011 - ieeexplore.ieee.org
For a broad class of practically relevant distribution properties, which includes entropy and
support size, nearly all of the proposed estimators have an especially simple form. Given a …