Challenges in deploying machine learning: a survey of case studies

A Paleyes, RG Urma, ND Lawrence - ACM computing surveys, 2022 - dl.acm.org
In recent years, machine learning has transitioned from a field of academic research interest
to a field capable of solving real-world business problems. However, the deployment of …

Knowledge graph quality management: a comprehensive survey

B Xue, L Zou - IEEE Transactions on Knowledge and Data …, 2022 - ieeexplore.ieee.org
As a powerful expression of human knowledge in a structural form, knowledge graph (KG)
has drawn great attention from both the academia and the industry and a large number of …

Holistic evaluation of language models

P Liang, R Bommasani, T Lee, D Tsipras… - arXiv preprint arXiv …, 2022 - arxiv.org
Language models (LMs) are becoming the foundation for almost all major language
technologies, but their capabilities, limitations, and risks are not well understood. We present …

Can foundation models wrangle your data?

A Narayan, I Chami, L Orr, S Arora, C Ré - arXiv preprint arXiv:2205.09911, 2022 - arxiv.org
Foundation Models (FMs) are models trained on large corpora of data that, at very large
scale, can generalize to new tasks without any task-specific finetuning. As these models …

[HTML][HTML] A benchmark for data imputation methods

S Jäger, A Allhorn, F Bießmann - Frontiers in big Data, 2021 - frontiersin.org
With the increasing importance and complexity of data pipelines, data quality became one of
the key challenges in modern software applications. The importance of data quality has …

[图书][B] Data cleaning

IF Ilyas, X Chu - 2019 - books.google.com
This is an overview of the end-to-end data cleaning process. Data quality is one of the most
important problems in data management, since dirty data often leads to inaccurate data …

Holoclean: Holistic data repairs with probabilistic inference

T Rekatsinas, X Chu, IF Ilyas, C Ré - arXiv preprint arXiv:1702.00820, 2017 - arxiv.org
We introduce HoloClean, a framework for holistic data repairing driven by probabilistic
inference. HoloClean unifies existing qualitative data repairing approaches, which rely on …

Creating embeddings of heterogeneous relational datasets for data integration tasks

R Cappuzzo, P Papotti… - Proceedings of the 2020 …, 2020 - dl.acm.org
Deep learning based techniques have been recently used with promising results for data
integration problems. Some methods directly use pre-trained embeddings that were trained …

Holodetect: Few-shot learning for error detection

A Heidari, J McGrath, IF Ilyas… - Proceedings of the 2019 …, 2019 - dl.acm.org
We introduce a few-shot learning framework for error detection. We show that data
augmentation (a form of weak supervision) is key to training high-quality, ML-based error …

[PDF][PDF] Data Integration: The Current Status and the Way Forward.

M Stonebraker, IF Ilyas - IEEE Data Eng. Bull., 2018 - cs.uwaterloo.ca
We discuss scalable data integration challenges in the enterprise inspired by our
experience at Tamr1. We use multiple real customer examples to highlight the technical …