Data cleaning: Overview and emerging challenges

X Chu, IF Ilyas, S Krishnan, J Wang - Proceedings of the 2016 …, 2016 - dl.acm.org
Detecting and repairing dirty data is one of the perennial challenges in data analytics, and
failure to do so can result in inaccurate analytics and unreliable decisions. Over the past few …

[图书][B] Data cleaning

IF Ilyas, X Chu - 2019 - books.google.com
This is an overview of the end-to-end data cleaning process. Data quality is one of the most
important problems in data management, since dirty data often leads to inaccurate data …

Scorpion: Explaining away outliers in aggregate queries

E Wu, S Madden - 2013 - dspace.mit.edu
Database users commonly explore large data sets by running aggregate queries that project
the data down to a smaller number of points and dimensions, and visualizing the results …

Trends in cleaning relational data: Consistency and deduplication

IF Ilyas, X Chu - Foundations and Trends® in Databases, 2015 - nowpublishers.com
Data quality is one of the most important problems in data management, since dirty data
often leads to inaccurate data analytics results and wrong business decisions. Poor data …

A formal approach to finding explanations for database queries

S Roy, D Suciu - Proceedings of the 2014 ACM SIGMOD international …, 2014 - dl.acm.org
As a consequence of the popularity of big data, many users with a variety of backgrounds
seek to extract high level information from datasets collected from various sources and …

Data quality: From theory to practice

W Fan - Acm Sigmod Record, 2015 - dl.acm.org
Data quantity and data quality, like two sides of a coin, are equally important to data
management. This paper provides an overview of recent advances in the study of data …

Data x-ray: A diagnostic tool for data errors

X Wang, XL Dong, A Meliou - Proceedings of the 2015 ACM SIGMOD …, 2015 - dl.acm.org
A lot of systems and applications are data-driven, and the correctness of their operation
relies heavily on the correctness of their data. While existing data cleaning techniques can …

Data provenance

B Glavic - Foundations and Trends® in Databases, 2021 - nowpublishers.com
Data provenance has evolved from a niche topic to a mainstream area of research in
databases and other research communities. This article gives a comprehensive introduction …

Xinsight: explainable data analysis through the lens of causality

P Ma, R Ding, S Wang, S Han, D Zhang - … of the ACM on Management of …, 2023 - dl.acm.org
In light of the growing popularity of Exploratory Data Analysis (EDA), understanding the
underlying causes of the knowledge acquired by EDA is crucial. However, it remains under …

Qualitative data cleaning

X Chu, IF Ilyas - Proceedings of the VLDB Endowment, 2016 - dl.acm.org
Data quality is one of the most important problems in data management, since dirty data
often leads to inaccurate data analytics results and wrong business decisions. Data cleaning …