Data-centric artificial intelligence: A survey
Artificial Intelligence (AI) is making a profound impact in almost every domain. A vital enabler
of its great success is the availability of abundant and high-quality data for building machine …
of its great success is the availability of abundant and high-quality data for building machine …
[HTML][HTML] Consolidated reporting guidelines for prognostic and diagnostic machine learning modeling studies: development and validation
Background The reporting of machine learning (ML) prognostic and diagnostic modeling
studies is often inadequate, making it difficult to understand and replicate such studies. To …
studies is often inadequate, making it difficult to understand and replicate such studies. To …
Can you rely on your model evaluation? improving model evaluation with synthetic test data
Evaluating the performance of machine learning models on diverse and underrepresented
subgroups is essential for ensuring fairness and reliability in real-world applications …
subgroups is essential for ensuring fairness and reliability in real-world applications …
TRIAGE: Characterizing and auditing training data for improved regression
Data quality is crucial for robust machine learning algorithms, with the recent interest in data-
centric AI emphasizing the importance of training data characterization. However, current …
centric AI emphasizing the importance of training data characterization. However, current …
Reimagining synthetic tabular data generation through data-centric AI: A comprehensive benchmark
Synthetic data serves as an alternative in training machine learning models, particularly
when real-world data is limited or inaccessible. However, ensuring that synthetic data …
when real-world data is limited or inaccessible. However, ensuring that synthetic data …
ydata-profiling: Accelerating data-centric AI with high-quality data
F Clemente, GM Ribeiro, A Quemy, MS Santos… - Neurocomputing, 2023 - Elsevier
Abstract ydata-profiling is an open-source Python package for advanced exploratory data
analysis that enables users to generate data profiling reports in a simple, fast, and efficient …
analysis that enables users to generate data profiling reports in a simple, fast, and efficient …
A seven-layer model with checklists for standardising fairness assessment throughout the AI lifecycle
Problem statement: Standardisation of AI fairness rules and benchmarks is challenging
because AI fairness and other ethical requirements depend on multiple factors, such as …
because AI fairness and other ethical requirements depend on multiple factors, such as …
[HTML][HTML] A Data-Centric AI Paradigm for Socio-Industrial and Global Challenges
A Majeed, SO Hwang - Electronics, 2024 - mdpi.com
Due to huge investments by both the public and private sectors, artificial intelligence (AI) has
made tremendous progress in solving multiple real-world problems such as disease …
made tremendous progress in solving multiple real-world problems such as disease …
Curated llm: Synergy of llms and data curation for tabular augmentation in ultra low-data regimes
Machine Learning (ML) in low-data settings remains an underappreciated yet crucial
problem. This challenge is pronounced in low-to-middle income countries where access to …
problem. This challenge is pronounced in low-to-middle income countries where access to …
DAGnosis: Localized Identification of Data Inconsistencies using Structures
Identification and appropriate handling of inconsistencies in data at deployment time is
crucial to reliably use machine learning models. While recent data-centric methods are able …
crucial to reliably use machine learning models. While recent data-centric methods are able …