Data profiling: A tutorial

Z Abedjan, L Golab, F Naumann - Proceedings of the 2017 ACM …, 2017 - dl.acm.org
is to understand the dataset at hand and its metadata. The process of metadata discovery is
known as data profiling. Profiling activities range from ad-hoc approaches, such as eye …

Data dependencies for query optimization: a survey

J Kossmann, T Papenbrock, F Naumann - The VLDB Journal, 2022 - Springer
Effective query optimization is a core feature of any database management system. While
most query optimization techniques make use of simple metadata, such as cardinalities and …

Efficient denial constraint discovery with hydra

T Bleifuß, S Kruse, F Naumann - Proceedings of the VLDB Endowment, 2017 - dl.acm.org
Denial constraints (DCs) are a generalization of many other integrity constraints (ICs) widely
used in databases, such as key constraints, functional dependencies, or order …

Data dependencies extended for variety and veracity: A family tree

S Song, F Gao, R Huang… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Besides the conventional schema-oriented tasks, data dependencies are recently revisited
for data quality applications, such as violation detection, data repairing and record matching …

Pattern functional dependencies for data cleaning

A Qahtan, N Tang, M Ouzzani, Y Cao… - Proceedings of the …, 2020 - research.ed.ac.uk
Patterns (or regex-based expressions) are widely used to constrain the format of a domain
(or a column), eg, a Year column should contain only four digits, and thus a value like “1980 …

Distributed implementations of dependency discovery algorithms

H Saxena, L Golab, IF Ilyas - Proceedings of the VLDB Endowment, 2019 - dl.acm.org
We analyze the problem of discovering dependencies from distributed big data. Existing
(non-distributed) algorithms focus on minimizing computation by pruning the search space …

TSDDISCOVER: Discovering Data Dependency for Time Series Data

X Ding, Y Li, H Wang, C Wang, Y Liu… - 2024 IEEE 40th …, 2024 - ieeexplore.ieee.org
Intelligent devices often produce time series data that suffer from significant data quality
issues. While the utilization of data dependency in error detection and data repair has been …

Fast approximate denial constraint discovery

R Xiao, Z Tan, H Wang, S Ma - Proceedings of the VLDB Endowment, 2022 - dl.acm.org
We investigate the problem of discovering approximate denial constraints (DCs), for finding
DCs that hold with some exceptions to avoid overfitting real-life dirty data and facilitate data …

Effective and complete discovery of bidirectional order dependencies via set-based axioms

J Szlichta, P Godfrey, L Golab, M Kargar, D Srivastava - The VLDB Journal, 2018 - Springer
Integrity constraints (ICs) are useful for expressing and enforcing application semantics.
Formulating ICs manually, however, requires domain expertise, is prone to human error, and …

Fast incremental discovery of pointwise order dependencies

Z Tan, A Ran, S Ma, S Qin - Proceedings of the VLDB Endowment, 2020 - dl.acm.org
Pointwise order dependencies (PODs) are dependencies that specify ordering semantics on
attributes of tuples. POD discovery refers to the process of identifying the set Σ of valid and …