Learning from data with structured missingness
Missing data are an unavoidable complication in many machine learning tasks. When data
are 'missing at random'there exist a range of tools and techniques to deal with the issue …
are 'missing at random'there exist a range of tools and techniques to deal with the issue …
Data collection and quality challenges in deep learning: A data-centric ai perspective
Data-centric AI is at the center of a fundamental shift in software engineering where machine
learning becomes the new software, powered by big data and computing infrastructure …
learning becomes the new software, powered by big data and computing infrastructure …
Rab: Provable robustness against backdoor attacks
Recent studies have shown that deep neural net-works (DNNs) are vulnerable to
adversarial attacks, including evasion and backdoor (poisoning) attacks. On the defense …
adversarial attacks, including evasion and backdoor (poisoning) attacks. On the defense …
Opportunities and Challenges in Data-Centric AI
Artificial intelligence (AI) systems are trained to solve complex problems and learn to
perform specific tasks by using large volumes of data, such as prediction, classification …
perform specific tasks by using large volumes of data, such as prediction, classification …
A data quality-driven view of mlops
Developing machine learning models can be seen as a process similar to the one
established for traditional software development. A key difference between the two lies in the …
established for traditional software development. A key difference between the two lies in the …
Data management for machine learning: A survey
Machine learning (ML) has widespread applications and has revolutionized many
industries, but suffers from several challenges. First, sufficient high-quality training data is …
industries, but suffers from several challenges. First, sufficient high-quality training data is …
Cleanml: A study for evaluating the impact of data cleaning on ml classification tasks
Data quality affects machine learning (ML) model performances, and data scientists spend
considerable amount of time on data cleaning before model training. However, to date, there …
considerable amount of time on data cleaning before model training. However, to date, there …
Vf-ps: How to select important participants in vertical federated learning, efficiently and securely?
Abstract Vertical Federated Learning (VFL), that trains federated models over vertically
partitioned data, has emerged as an important learning paradigm. However, existing VFL …
partitioned data, has emerged as an important learning paradigm. However, existing VFL …
Goodcore: Data-effective and data-efficient machine learning through coreset selection over incomplete data
Given a dataset with incomplete data (eg, missing values), training a machine learning
model over the incomplete data requires two steps. First, it requires a data-effective step that …
model over the incomplete data requires two steps. First, it requires a data-effective step that …
[PDF][PDF] From Cleaning before ML to Cleaning for ML.
Data cleaning is widely regarded as a critical piece of machine learning (ML) applications,
as data errors can corrupt models in ways that cause the application to operate incorrectly …
as data errors can corrupt models in ways that cause the application to operate incorrectly …