Learning from data with structured missingness

R Mitra, SF McGough, T Chakraborti… - Nature Machine …, 2023 - nature.com
Missing data are an unavoidable complication in many machine learning tasks. When data
are 'missing at random'there exist a range of tools and techniques to deal with the issue …

Data collection and quality challenges in deep learning: A data-centric ai perspective

SE Whang, Y Roh, H Song, JG Lee - The VLDB Journal, 2023 - Springer
Data-centric AI is at the center of a fundamental shift in software engineering where machine
learning becomes the new software, powered by big data and computing infrastructure …

Rab: Provable robustness against backdoor attacks

M Weber, X Xu, B Karlaš, C Zhang… - 2023 IEEE Symposium …, 2023 - ieeexplore.ieee.org
Recent studies have shown that deep neural net-works (DNNs) are vulnerable to
adversarial attacks, including evasion and backdoor (poisoning) attacks. On the defense …

Opportunities and Challenges in Data-Centric AI

S Kumar, S Datta, V Singh, SK Singh, R Sharma - IEEE Access, 2024 - ieeexplore.ieee.org
Artificial intelligence (AI) systems are trained to solve complex problems and learn to
perform specific tasks by using large volumes of data, such as prediction, classification …

A data quality-driven view of mlops

C Renggli, L Rimanic, NM Gürel, B Karlaš… - arXiv preprint arXiv …, 2021 - arxiv.org
Developing machine learning models can be seen as a process similar to the one
established for traditional software development. A key difference between the two lies in the …

Data management for machine learning: A survey

C Chai, J Wang, Y Luo, Z Niu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Machine learning (ML) has widespread applications and has revolutionized many
industries, but suffers from several challenges. First, sufficient high-quality training data is …

Cleanml: A study for evaluating the impact of data cleaning on ml classification tasks

P Li, X Rao, J Blase, Y Zhang, X Chu… - 2021 IEEE 37th …, 2021 - ieeexplore.ieee.org
Data quality affects machine learning (ML) model performances, and data scientists spend
considerable amount of time on data cleaning before model training. However, to date, there …

Vf-ps: How to select important participants in vertical federated learning, efficiently and securely?

J Jiang, L Burkhalter, F Fu, B Ding… - Advances in …, 2022 - proceedings.neurips.cc
Abstract Vertical Federated Learning (VFL), that trains federated models over vertically
partitioned data, has emerged as an important learning paradigm. However, existing VFL …

Goodcore: Data-effective and data-efficient machine learning through coreset selection over incomplete data

C Chai, J Liu, N Tang, J Fan, D Miao, J Wang… - Proceedings of the …, 2023 - dl.acm.org
Given a dataset with incomplete data (eg, missing values), training a machine learning
model over the incomplete data requires two steps. First, it requires a data-effective step that …

[PDF][PDF] From Cleaning before ML to Cleaning for ML.

F Neutatz, B Chen, Z Abedjan, E Wu - IEEE Data Eng. Bull., 2021 - scholar.archive.org
Data cleaning is widely regarded as a critical piece of machine learning (ML) applications,
as data errors can corrupt models in ways that cause the application to operate incorrectly …