Automl to date and beyond: Challenges and opportunities
As big data becomes ubiquitous across domains, and more and more stakeholders aspire to
make the most of their data, demand for machine learning tools has spurred researchers to …
make the most of their data, demand for machine learning tools has spurred researchers to …
Data cleaning: Overview and emerging challenges
Detecting and repairing dirty data is one of the perennial challenges in data analytics, and
failure to do so can result in inaccurate analytics and unreliable decisions. Over the past few …
failure to do so can result in inaccurate analytics and unreliable decisions. Over the past few …
Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls
Text-to-SQL parsing, which aims at converting natural language instructions into executable
SQLs, has gained increasing attention in recent years. In particular, GPT-4 and Claude-2 …
SQLs, has gained increasing attention in recent years. In particular, GPT-4 and Claude-2 …
The effects of data quality on machine learning performance
L Budach, M Feuerpfeil, N Ihde, A Nathansen… - arXiv preprint arXiv …, 2022 - arxiv.org
Modern artificial intelligence (AI) applications require large quantities of training and test
data. This need creates critical challenges not only concerning the availability of such data …
data. This need creates critical challenges not only concerning the availability of such data …
Holoclean: Holistic data repairs with probabilistic inference
We introduce HoloClean, a framework for holistic data repairing driven by probabilistic
inference. HoloClean unifies existing qualitative data repairing approaches, which rely on …
inference. HoloClean unifies existing qualitative data repairing approaches, which rely on …
Machine knowledge: Creation and curation of comprehensive knowledge bases
Equipping machines with comprehensive knowledge of the world's entities and their
relationships has been a longstanding goal of AI. Over the last decade, large-scale …
relationships has been a longstanding goal of AI. Over the last decade, large-scale …
Automating large-scale data quality verification
Modern companies and institutions rely on data to guide every single business process and
decision. Missing or incorrect information seriously compromises any decision process …
decision. Missing or incorrect information seriously compromises any decision process …
[图书][B] Magellan: Toward building entity matching management systems
PV Konda - 2018 - search.proquest.com
Entity matching (EM) identifies data instances that refer to the same real-world entity, such
as (David Smith, UWMadison) and (DM Smith, UWM). This problem has been a long …
as (David Smith, UWMadison) and (DM Smith, UWM). This problem has been a long …
Detecting data errors: Where are we and what needs to be done?
Data cleaning has played a critical role in ensuring data quality for enterprise applications.
Naturally, there has been extensive research in this area, and many data cleaning …
Naturally, there has been extensive research in this area, and many data cleaning …