Data cleaning and machine learning: a systematic literature review
Abstract Machine Learning (ML) is integrated into a growing number of systems for various
applications. Because the performance of an ML model is highly dependent on the quality of …
applications. Because the performance of an ML model is highly dependent on the quality of …
Automated rule-based data cleaning using NLP
K Mavrogiorgos, A Mavrogiorgou… - … 32nd Conference of …, 2022 - ieeexplore.ieee.org
Data Cleaning is a subfield of Data Mining that is thriving in the recent years. Ensuring the
reliability of data, either when generated or received, is of vital importance to provide the …
reliability of data, either when generated or received, is of vital importance to provide the …
Desbordante: from benchmarking suite to high-performance science-intensive data profiler (preprint)
G Chernishev, M Polyntsov, A Chizhov… - arXiv preprint arXiv …, 2023 - arxiv.org
Pioneering data profiling systems such as Metanome and OpenClean brought public
attention to science-intensive data profiling. This type of profiling aims to extract complex …
attention to science-intensive data profiling. This type of profiling aims to extract complex …
Solving data quality problems with desbordante: a demo
G Chernishev, M Polyntsov, A Chizhov… - arXiv preprint arXiv …, 2023 - arxiv.org
Data profiling is an essential process in modern data-driven industries. One of its critical
components is the discovery and validation of complex statistics, including functional …
components is the discovery and validation of complex statistics, including functional …
RumbleML: program the lakehouse with JSONiq
Lakehouse systems have reached in the past few years unprecedented size and
heterogeneity and have been embraced by many industry players. However, they are often …
heterogeneity and have been embraced by many industry players. However, they are often …
Feature Discovery for Data-Centric AI
A Ionescu - 2025 - repository.tudelft.nl
We are witnessing a paradigm shift in machine learning (ML) and artificial intelligence (AI)
from a focus primarily on innovating ML models, the model-centric paradigm, to prioritising …
from a focus primarily on innovating ML models, the model-centric paradigm, to prioritising …
Enhancing Data Accuracy in Public Health Datasets Through a Constructive Research Design
DMR Vargas - 2024 - search.proquest.com
The ongoing generation of data has led to the development of various approaches to
produce reliable and accurate data products. However, standardizing data cleaning and …
produce reliable and accurate data products. However, standardizing data cleaning and …