Data cleaning: Overview and emerging challenges

X Chu, IF Ilyas, S Krishnan, J Wang - Proceedings of the 2016 …, 2016 - dl.acm.org
Detecting and repairing dirty data is one of the perennial challenges in data analytics, and
failure to do so can result in inaccurate analytics and unreliable decisions. Over the past few …

To compress or not to compress—self-supervised learning and information theory: A review

R Shwartz Ziv, Y LeCun - Entropy, 2024 - mdpi.com
Deep neural networks excel in supervised learning tasks but are constrained by the need for
extensive labeled data. Self-supervised learning emerges as a promising alternative …

User-friendly introduction to PAC-Bayes bounds

P Alquier - Foundations and Trends® in Machine Learning, 2024 - nowpublishers.com
Aggregated predictors are obtained by making a set of basic predictors vote according to
some weights, that is, to some probability distribution. Randomized predictors are obtained …

A finite time analysis of temporal difference learning with linear function approximation

J Bhandari, D Russo, R Singal - Conference on learning …, 2018 - proceedings.mlr.press
Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value
function corresponding to a given policy in a Markov decision process. Although TD is one of …

Multiaccuracy: Black-box post-processing for fairness in classification

MP Kim, A Ghorbani, J Zou - Proceedings of the 2019 AAAI/ACM …, 2019 - dl.acm.org
Prediction systems are successfully deployed in applications ranging from disease
diagnosis, to predicting credit worthiness, to image recognition. Even when the overall …

Information-theoretic analysis of generalization capability of learning algorithms

A Xu, M Raginsky - Advances in neural information …, 2017 - proceedings.neurips.cc
We derive upper bounds on the generalization error of a learning algorithm in terms of the
mutual information between its input and output. The bounds provide an information …

Deep exploration via randomized value functions

I Osband, B Van Roy, DJ Russo, Z Wen - Journal of Machine Learning …, 2019 - jmlr.org
We study the use of randomized value functions to guide deep exploration in reinforcement
learning. This offers an elegant means for synthesizing statistically and computationally …

A translational perspective towards clinical AI fairness

M Liu, Y Ning, S Teixayavong, M Mertens, J Xu… - NPJ Digital …, 2023 - nature.com
Artificial intelligence (AI) has demonstrated the ability to extract insights from data, but the
fairness of such data-driven insights remains a concern in high-stakes fields. Despite …

Reasoning about generalization via conditional mutual information

T Steinke, L Zakynthinou - Conference on Learning Theory, 2020 - proceedings.mlr.press
We provide an information-theoretic framework for studying the generalization properties of
machine learning algorithms. Our framework ties together existing approaches, including …

Simple bayesian algorithms for best arm identification

D Russo - Conference on Learning Theory, 2016 - proceedings.mlr.press
This paper considers the optimal adaptive allocation of measurement effort for identifying the
best among a finite set of options or designs. An experimenter sequentially chooses designs …