Trak: Attributing model behavior at scale

SM Park, K Georgiev, A Ilyas, G Leclerc… - arXiv preprint arXiv …, 2023 - arxiv.org
The goal of data attribution is to trace model predictions back to training data. Despite a long
line of work towards this goal, existing approaches to data attribution tend to force users to …

Data-centric artificial intelligence: A survey

D Zha, ZP Bhat, KH Lai, F Yang, Z Jiang… - arXiv preprint arXiv …, 2023 - arxiv.org
Artificial Intelligence (AI) is making a profound impact in almost every domain. A vital enabler
of its great success is the availability of abundant and high-quality data for building machine …

Data banzhaf: A robust data valuation framework for machine learning

JT Wang, R Jia - International Conference on Artificial …, 2023 - proceedings.mlr.press
Data valuation has wide use cases in machine learning, including improving data quality
and creating economic incentives for data sharing. This paper studies the robustness of data …

[HTML][HTML] Training data influence analysis and estimation: A survey

Z Hammoudeh, D Lowd - Machine Learning, 2024 - Springer
Good models require good training data. For overparameterized deep models, the causal
relationship between training data and model predictions is increasingly opaque and poorly …

Sample based explanations via generalized representers

CP Tsai, CK Yeh, P Ravikumar - Advances in Neural …, 2024 - proceedings.neurips.cc
We propose a general class of sample based explanations of machine learning models,
which we term generalized representers. To measure the effect of a training sample on a …

Recurrent neural networks enable design of multifunctional synthetic human gut microbiome dynamics

M Baranwal, RL Clark, J Thompson, Z Sun, AO Hero… - Elife, 2022 - elifesciences.org
Predicting the dynamics and functions of microbiomes constructed from the bottom-up is a
key challenge in exploiting them to our benefit. Current models based on ecological theory …

CS-Shapley: class-wise Shapley values for data valuation in classification

S Schoch, H Xu, Y Ji - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Data valuation, or the valuation of individual datum contributions, has seen growing interest
in machine learning due to its demonstrable efficacy for tasks such as noisy label detection …

Intriguing properties of data attribution on diffusion models

X Zheng, T Pang, C Du, J Jiang, M Lin - arXiv preprint arXiv:2311.00500, 2023 - arxiv.org
Data attribution seeks to trace model outputs back to training data. With the recent
development of diffusion models, data attribution has become a desired module to properly …

Data debugging with shapley importance over end-to-end machine learning pipelines

B Karlaš, D Dao, M Interlandi, B Li, S Schelter… - arXiv preprint arXiv …, 2022 - arxiv.org
Developing modern machine learning (ML) applications is data-centric, of which one
fundamental challenge is to understand the influence of data quality to ML training--" Which …

A privacy-friendly approach to data valuation

JT Wang, Y Zhu, YX Wang, R Jia… - Advances in Neural …, 2024 - proceedings.neurips.cc
Data valuation, a growing field that aims at quantifying the usefulness of individual data
sources for training machine learning (ML) models, faces notable yet often overlooked …