Automl to date and beyond: Challenges and opportunities

SK Karmaker, MM Hassan, MJ Smith, L Xu… - ACM Computing …, 2021 - dl.acm.org
As big data becomes ubiquitous across domains, and more and more stakeholders aspire to
make the most of their data, demand for machine learning tools has spurred researchers to …

Data cleaning: Overview and emerging challenges

X Chu, IF Ilyas, S Krishnan, J Wang - Proceedings of the 2016 …, 2016 - dl.acm.org
Detecting and repairing dirty data is one of the perennial challenges in data analytics, and
failure to do so can result in inaccurate analytics and unreliable decisions. Over the past few …

Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls

J Li, B Hui, G Qu, J Yang, B Li, B Li… - Advances in …, 2024 - proceedings.neurips.cc
Text-to-SQL parsing, which aims at converting natural language instructions into executable
SQLs, has gained increasing attention in recent years. In particular, GPT-4 and Claude-2 …

[图书][B] Data cleaning

IF Ilyas, X Chu - 2019 - books.google.com
This is an overview of the end-to-end data cleaning process. Data quality is one of the most
important problems in data management, since dirty data often leads to inaccurate data …

The effects of data quality on machine learning performance

L Budach, M Feuerpfeil, N Ihde, A Nathansen… - arXiv preprint arXiv …, 2022 - arxiv.org
Modern artificial intelligence (AI) applications require large quantities of training and test
data. This need creates critical challenges not only concerning the availability of such data …

Holoclean: Holistic data repairs with probabilistic inference

T Rekatsinas, X Chu, IF Ilyas, C Ré - arXiv preprint arXiv:1702.00820, 2017 - arxiv.org
We introduce HoloClean, a framework for holistic data repairing driven by probabilistic
inference. HoloClean unifies existing qualitative data repairing approaches, which rely on …

Machine knowledge: Creation and curation of comprehensive knowledge bases

G Weikum, XL Dong, S Razniewski… - … and Trends® in …, 2021 - nowpublishers.com
Equipping machines with comprehensive knowledge of the world's entities and their
relationships has been a longstanding goal of AI. Over the last decade, large-scale …

Automating large-scale data quality verification

S Schelter, D Lange, P Schmidt, M Celikel… - Proceedings of the …, 2018 - dl.acm.org
Modern companies and institutions rely on data to guide every single business process and
decision. Missing or incorrect information seriously compromises any decision process …

[图书][B] Magellan: Toward building entity matching management systems

PV Konda - 2018 - search.proquest.com
Entity matching (EM) identifies data instances that refer to the same real-world entity, such
as (David Smith, UWMadison) and (DM Smith, UWM). This problem has been a long …

Detecting data errors: Where are we and what needs to be done?

Z Abedjan, X Chu, D Deng, RC Fernandez… - Proceedings of the …, 2016 - dl.acm.org
Data cleaning has played a critical role in ensuring data quality for enterprise applications.
Naturally, there has been extensive research in this area, and many data cleaning …