Dc-check: A data-centric ai checklist to guide the development of reliable machine learning systems

D Zha, ZP Bhat, KH Lai, F Yang, Z Jiang… - arXiv preprint arXiv …, 2023 - arxiv.org

Artificial Intelligence (AI) is making a profound impact in almost every domain. A vital enabler
of its great success is the availability of abundant and high-quality data for building machine …

被引用次数：162 相关文章所有 3 个版本

[HTML] jmir.org

[HTML][HTML] Consolidated reporting guidelines for prognostic and diagnostic machine learning modeling studies: development and validation

W Klement, K El Emam - Journal of Medical Internet Research, 2023 - jmir.org

Background The reporting of machine learning (ML) prognostic and diagnostic modeling
studies is often inadequate, making it difficult to understand and replicate such studies. To …

被引用次数：15 相关文章所有 7 个版本

[PDF] neurips.cc

Can you rely on your model evaluation? improving model evaluation with synthetic test data

B van Breugel, N Seedat, F Imrie… - Advances in Neural …, 2024 - proceedings.neurips.cc

Evaluating the performance of machine learning models on diverse and underrepresented
subgroups is essential for ensuring fairness and reliability in real-world applications …

被引用次数：6 相关文章所有 5 个版本

[PDF] neurips.cc

TRIAGE: Characterizing and auditing training data for improved regression

N Seedat, J Crabbé, Z Qian… - Advances in Neural …, 2024 - proceedings.neurips.cc

Data quality is crucial for robust machine learning algorithms, with the recent interest in data-
centric AI emphasizing the importance of training data characterization. However, current …

被引用次数：4 相关文章所有 5 个版本

[PDF] neurips.cc

Reimagining synthetic tabular data generation through data-centric AI: A comprehensive benchmark

L Hansen, N Seedat… - Advances in Neural …, 2023 - proceedings.neurips.cc

Synthetic data serves as an alternative in training machine learning models, particularly
when real-world data is limited or inaccessible. However, ensuring that synthetic data …

被引用次数：6 相关文章所有 6 个版本

ydata-profiling: Accelerating data-centric AI with high-quality data

F Clemente, GM Ribeiro, A Quemy, MS Santos… - Neurocomputing, 2023 - Elsevier

Abstract ydata-profiling is an open-source Python package for advanced exploratory data
analysis that enables users to generate data profiling reports in a simple, fast, and efficient …

被引用次数：6 相关文章所有 2 个版本

A seven-layer model with checklists for standardising fairness assessment throughout the AI lifecycle

A Agarwal, H Agarwal - AI and Ethics, 2024 - Springer

Problem statement: Standardisation of AI fairness rules and benchmarks is challenging
because AI fairness and other ethical requirements depend on multiple factors, such as …

被引用次数：9 相关文章

[HTML] mdpi.com

[HTML][HTML] A Data-Centric AI Paradigm for Socio-Industrial and Global Challenges

A Majeed, SO Hwang - Electronics, 2024 - mdpi.com

Due to huge investments by both the public and private sectors, artificial intelligence (AI) has
made tremendous progress in solving multiple real-world problems such as disease …

被引用次数：1 相关文章

[PDF] arxiv.org

Curated llm: Synergy of llms and data curation for tabular augmentation in ultra low-data regimes

N Seedat, N Huynh, B van Breugel… - arXiv preprint arXiv …, 2023 - arxiv.org

Machine Learning (ML) in low-data settings remains an underappreciated yet crucial
problem. This challenge is pronounced in low-to-middle income countries where access to …

被引用次数：5 相关文章所有 3 个版本

[PDF] mlr.press

DAGnosis: Localized Identification of Data Inconsistencies using Structures

N Huynh, J Berrevoets, N Seedat… - International …, 2024 - proceedings.mlr.press

Identification and appropriate handling of inconsistencies in data at deployment time is
crucial to reliably use machine learning models. While recent data-centric methods are able …

被引用次数：2 相关文章所有 3 个版本