Data cleaning and machine learning: a systematic literature review

PO Côté, A Nikanjam, N Ahmed, D Humeniuk… - Automated Software …, 2024 - Springer
Abstract Machine Learning (ML) is integrated into a growing number of systems for various
applications. Because the performance of an ML model is highly dependent on the quality of …

Automated rule-based data cleaning using NLP

K Mavrogiorgos, A Mavrogiorgou… - … 32nd Conference of …, 2022 - ieeexplore.ieee.org
Data Cleaning is a subfield of Data Mining that is thriving in the recent years. Ensuring the
reliability of data, either when generated or received, is of vital importance to provide the …

Desbordante: from benchmarking suite to high-performance science-intensive data profiler (preprint)

G Chernishev, M Polyntsov, A Chizhov… - arXiv preprint arXiv …, 2023 - arxiv.org
Pioneering data profiling systems such as Metanome and OpenClean brought public
attention to science-intensive data profiling. This type of profiling aims to extract complex …

Solving data quality problems with desbordante: a demo

G Chernishev, M Polyntsov, A Chizhov… - arXiv preprint arXiv …, 2023 - arxiv.org
Data profiling is an essential process in modern data-driven industries. One of its critical
components is the discovery and validation of complex statistics, including functional …

RumbleML: program the lakehouse with JSONiq

G Fourny, D Dao, CB Cikis, C Zhang… - arXiv preprint arXiv …, 2021 - arxiv.org
Lakehouse systems have reached in the past few years unprecedented size and
heterogeneity and have been embraced by many industry players. However, they are often …

Feature Discovery for Data-Centric AI

A Ionescu - 2025 - repository.tudelft.nl
We are witnessing a paradigm shift in machine learning (ML) and artificial intelligence (AI)
from a focus primarily on innovating ML models, the model-centric paradigm, to prioritising …

Enhancing Data Accuracy in Public Health Datasets Through a Constructive Research Design

DMR Vargas - 2024 - search.proquest.com
The ongoing generation of data has led to the development of various approaches to
produce reliable and accurate data products. However, standardizing data cleaning and …