A survey of queries over uncertain data
Y Wang, X Li, X Li, Y Wang - Knowledge and information systems, 2013 - Springer
Uncertain data have already widely existed in many practical applications recently, such as
sensor networks, RFID networks, location-based services, and mobile object management …
sensor networks, RFID networks, location-based services, and mobile object management …
Synopses for massive data: Samples, histograms, wavelets, sketches
Abstract Methods for Approximate Query Processing (AQP) are essential for dealing with
massive data. They are often the only means of providing interactive response times when …
massive data. They are often the only means of providing interactive response times when …
Estimating the unseen: an n/log (n)-sample estimator for entropy and support size, shown optimal via new CLTs
We introduce a new approach to characterizing the unobserved portion of a distribution,
which provides sublinear--sample estimators achieving arbitrarily small additive constant …
which provides sublinear--sample estimators achieving arbitrarily small additive constant …
An optimal algorithm for the distinct elements problem
We give the first optimal algorithm for estimating the number of distinct elements in a data
stream, closing a long line of theoretical research on this problem begun by Flajolet and …
stream, closing a long line of theoretical research on this problem begun by Flajolet and …
When the levee breaks: a practical guide to sketching algorithms for processing the flood of genomic data
WPM Rowe - Genome biology, 2019 - Springer
Considerable advances in genomics over the past decade have resulted in vast amounts of
data being generated and deposited in global archives. The growth of these archives …
data being generated and deposited in global archives. The growth of these archives …
Quickr: Lazily approximating complex adhoc queries in bigdata clusters
We present a system that approximates the answer to complex ad-hoc queries in big-data
clusters by injecting samplers on-the-fly and without requiring pre-existing samples …
clusters by injecting samplers on-the-fly and without requiring pre-existing samples …
A sample-and-clean framework for fast and accurate query processing on dirty data
In emerging Big Data scenarios, obtaining timely, high-quality answers to aggregate queries
is difficult due to the challenges of processing and cleaning large, dirty data sets. To …
is difficult due to the challenges of processing and cleaning large, dirty data sets. To …
Centralities in large networks: Algorithms and observations
Node centrality measures are important in a large number of graph applications, from search
and ranking to social and biological network analysis. In this paper we study node centrality …
and ranking to social and biological network analysis. In this paper we study node centrality …
Cardinality estimation: An experimental survey
H Harmouch, F Naumann - Proceedings of the VLDB Endowment, 2017 - dl.acm.org
Data preparation and data profiling comprise many both basic and complex tasks to analyze
a dataset at hand and extract metadata, such as data distributions, key candidates, and …
a dataset at hand and extract metadata, such as data distributions, key candidates, and …
The power of linear estimators
For a broad class of practically relevant distribution properties, which includes entropy and
support size, nearly all of the proposed estimators have an especially simple form. Given a …
support size, nearly all of the proposed estimators have an especially simple form. Given a …