User-friendly introduction to PAC-Bayes bounds
P Alquier - Foundations and Trends® in Machine Learning, 2024 - nowpublishers.com
Aggregated predictors are obtained by making a set of basic predictors vote according to
some weights, that is, to some probability distribution. Randomized predictors are obtained …
some weights, that is, to some probability distribution. Randomized predictors are obtained …
Recent advances in deep learning theory
Deep learning is usually described as an experiment-driven field under continuous criticizes
of lacking theoretical foundations. This problem has been partially fixed by a large volume of …
of lacking theoretical foundations. This problem has been partially fixed by a large volume of …
Reasoning about generalization via conditional mutual information
T Steinke, L Zakynthinou - Conference on Learning Theory, 2020 - proceedings.mlr.press
We provide an information-theoretic framework for studying the generalization properties of
machine learning algorithms. Our framework ties together existing approaches, including …
machine learning algorithms. Our framework ties together existing approaches, including …
Tightening mutual information-based bounds on generalization error
An information-theoretic upper bound on the generalization error of supervised learning
algorithms is derived. The bound is constructed in terms of the mutual information between …
algorithms is derived. The bound is constructed in terms of the mutual information between …
Information-theoretic generalization bounds for stochastic gradient descent
We study the generalization properties of the popular stochastic optimization method known
as stochastic gradient descent (SGD) for optimizing general non-convex loss functions. Our …
as stochastic gradient descent (SGD) for optimizing general non-convex loss functions. Our …
Sharpened generalization bounds based on conditional mutual information and an application to noisy, iterative algorithms
The information-theoretic framework of Russo and Zou (2016) and Xu and Raginsky (2017)
provides bounds on the generalization error of a learning algorithm in terms of the mutual …
provides bounds on the generalization error of a learning algorithm in terms of the mutual …
The dynamics of sharpness-aware minimization: Bouncing across ravines and drifting towards wide minima
PL Bartlett, PM Long, O Bousquet - Journal of Machine Learning Research, 2023 - jmlr.org
We consider Sharpness-Aware Minimization (SAM), a gradient-based optimization method
for deep networks that has exhibited performance improvements on image and language …
for deep networks that has exhibited performance improvements on image and language …
On the role of data in PAC-Bayes bounds
The dominant term in PAC-Bayes bounds is often the Kullback-Leibler divergence between
the posterior and prior. For so-called linear PAC-Bayes risk bounds based on the empirical …
the posterior and prior. For so-called linear PAC-Bayes risk bounds based on the empirical …
Randomized adversarial training via taylor expansion
In recent years, there has been an explosion of research into developing more robust deep
neural networks against adversarial examples. Adversarial training appears as one of the …
neural networks against adversarial examples. Adversarial training appears as one of the …
Shape matters: Understanding the implicit bias of the noise covariance
The noise in stochastic gradient descent (SGD) provides a crucial implicit regularization
effect for training overparameterized models. Prior theoretical work largely focuses on …
effect for training overparameterized models. Prior theoretical work largely focuses on …