Data profiling: A tutorial
is to understand the dataset at hand and its metadata. The process of metadata discovery is
known as data profiling. Profiling activities range from ad-hoc approaches, such as eye …
known as data profiling. Profiling activities range from ad-hoc approaches, such as eye …
Data dependencies for query optimization: a survey
Effective query optimization is a core feature of any database management system. While
most query optimization techniques make use of simple metadata, such as cardinalities and …
most query optimization techniques make use of simple metadata, such as cardinalities and …
Efficient denial constraint discovery with hydra
Denial constraints (DCs) are a generalization of many other integrity constraints (ICs) widely
used in databases, such as key constraints, functional dependencies, or order …
used in databases, such as key constraints, functional dependencies, or order …
Data dependencies extended for variety and veracity: A family tree
Besides the conventional schema-oriented tasks, data dependencies are recently revisited
for data quality applications, such as violation detection, data repairing and record matching …
for data quality applications, such as violation detection, data repairing and record matching …
Pattern functional dependencies for data cleaning
Patterns (or regex-based expressions) are widely used to constrain the format of a domain
(or a column), eg, a Year column should contain only four digits, and thus a value like “1980 …
(or a column), eg, a Year column should contain only four digits, and thus a value like “1980 …
Distributed implementations of dependency discovery algorithms
We analyze the problem of discovering dependencies from distributed big data. Existing
(non-distributed) algorithms focus on minimizing computation by pruning the search space …
(non-distributed) algorithms focus on minimizing computation by pruning the search space …
TSDDISCOVER: Discovering Data Dependency for Time Series Data
Intelligent devices often produce time series data that suffer from significant data quality
issues. While the utilization of data dependency in error detection and data repair has been …
issues. While the utilization of data dependency in error detection and data repair has been …
Fast approximate denial constraint discovery
R Xiao, Z Tan, H Wang, S Ma - Proceedings of the VLDB Endowment, 2022 - dl.acm.org
We investigate the problem of discovering approximate denial constraints (DCs), for finding
DCs that hold with some exceptions to avoid overfitting real-life dirty data and facilitate data …
DCs that hold with some exceptions to avoid overfitting real-life dirty data and facilitate data …
Effective and complete discovery of bidirectional order dependencies via set-based axioms
Integrity constraints (ICs) are useful for expressing and enforcing application semantics.
Formulating ICs manually, however, requires domain expertise, is prone to human error, and …
Formulating ICs manually, however, requires domain expertise, is prone to human error, and …
Fast incremental discovery of pointwise order dependencies
Z Tan, A Ran, S Ma, S Qin - Proceedings of the VLDB Endowment, 2020 - dl.acm.org
Pointwise order dependencies (PODs) are dependencies that specify ordering semantics on
attributes of tuples. POD discovery refers to the process of identifying the set Σ of valid and …
attributes of tuples. POD discovery refers to the process of identifying the set Σ of valid and …