Data-centric artificial intelligence: A survey

D Zha, ZP Bhat, KH Lai, F Yang, Z Jiang… - arXiv preprint arXiv …, 2023 - arxiv.org
Artificial Intelligence (AI) is making a profound impact in almost every domain. A vital enabler
of its great success is the availability of abundant and high-quality data for building machine …

[HTML][HTML] Consolidated reporting guidelines for prognostic and diagnostic machine learning modeling studies: development and validation

W Klement, K El Emam - Journal of Medical Internet Research, 2023 - jmir.org
Background The reporting of machine learning (ML) prognostic and diagnostic modeling
studies is often inadequate, making it difficult to understand and replicate such studies. To …

Can you rely on your model evaluation? improving model evaluation with synthetic test data

B van Breugel, N Seedat, F Imrie… - Advances in Neural …, 2024 - proceedings.neurips.cc
Evaluating the performance of machine learning models on diverse and underrepresented
subgroups is essential for ensuring fairness and reliability in real-world applications …

TRIAGE: Characterizing and auditing training data for improved regression

N Seedat, J Crabbé, Z Qian… - Advances in Neural …, 2024 - proceedings.neurips.cc
Data quality is crucial for robust machine learning algorithms, with the recent interest in data-
centric AI emphasizing the importance of training data characterization. However, current …

Reimagining synthetic tabular data generation through data-centric AI: A comprehensive benchmark

L Hansen, N Seedat… - Advances in Neural …, 2023 - proceedings.neurips.cc
Synthetic data serves as an alternative in training machine learning models, particularly
when real-world data is limited or inaccessible. However, ensuring that synthetic data …

ydata-profiling: Accelerating data-centric AI with high-quality data

F Clemente, GM Ribeiro, A Quemy, MS Santos… - Neurocomputing, 2023 - Elsevier
Abstract ydata-profiling is an open-source Python package for advanced exploratory data
analysis that enables users to generate data profiling reports in a simple, fast, and efficient …

A seven-layer model with checklists for standardising fairness assessment throughout the AI lifecycle

A Agarwal, H Agarwal - AI and Ethics, 2024 - Springer
Problem statement: Standardisation of AI fairness rules and benchmarks is challenging
because AI fairness and other ethical requirements depend on multiple factors, such as …

[HTML][HTML] A Data-Centric AI Paradigm for Socio-Industrial and Global Challenges

A Majeed, SO Hwang - Electronics, 2024 - mdpi.com
Due to huge investments by both the public and private sectors, artificial intelligence (AI) has
made tremendous progress in solving multiple real-world problems such as disease …

Curated llm: Synergy of llms and data curation for tabular augmentation in ultra low-data regimes

N Seedat, N Huynh, B van Breugel… - arXiv preprint arXiv …, 2023 - arxiv.org
Machine Learning (ML) in low-data settings remains an underappreciated yet crucial
problem. This challenge is pronounced in low-to-middle income countries where access to …

DAGnosis: Localized Identification of Data Inconsistencies using Structures

N Huynh, J Berrevoets, N Seedat… - International …, 2024 - proceedings.mlr.press
Identification and appropriate handling of inconsistencies in data at deployment time is
crucial to reliably use machine learning models. While recent data-centric methods are able …