Information-theoretic generalization bounds for SGLD via data-dependent estimates

P Alquier - Foundations and Trends® in Machine Learning, 2024 - nowpublishers.com

Aggregated predictors are obtained by making a set of basic predictors vote according to
some weights, that is, to some probability distribution. Randomized predictors are obtained …

被引用次数：204 相关文章所有 6 个版本

[PDF] arxiv.org

Recent advances in deep learning theory

F He, D Tao - arXiv preprint arXiv:2012.10931, 2020 - arxiv.org

Deep learning is usually described as an experiment-driven field under continuous criticizes
of lacking theoretical foundations. This problem has been partially fixed by a large volume of …

被引用次数：51 相关文章所有 3 个版本

[PDF] mlr.press

Reasoning about generalization via conditional mutual information

T Steinke, L Zakynthinou - Conference on Learning Theory, 2020 - proceedings.mlr.press

We provide an information-theoretic framework for studying the generalization properties of
machine learning algorithms. Our framework ties together existing approaches, including …

被引用次数：179 相关文章所有 7 个版本

[PDF] arxiv.org

Tightening mutual information-based bounds on generalization error

Y Bu, S Zou, VV Veeravalli - IEEE Journal on Selected Areas in …, 2020 - ieeexplore.ieee.org

An information-theoretic upper bound on the generalization error of supervised learning
algorithms is derived. The bound is constructed in terms of the mutual information between …

被引用次数：201 相关文章所有 13 个版本

[PDF] mlr.press

Information-theoretic generalization bounds for stochastic gradient descent

G Neu, GK Dziugaite, M Haghifam… - … on Learning Theory, 2021 - proceedings.mlr.press

We study the generalization properties of the popular stochastic optimization method known
as stochastic gradient descent (SGD) for optimizing general non-convex loss functions. Our …

被引用次数：88 相关文章所有 8 个版本

[PDF] neurips.cc

Sharpened generalization bounds based on conditional mutual information and an application to noisy, iterative algorithms

M Haghifam, J Negrea, A Khisti… - Advances in …, 2020 - proceedings.neurips.cc

The information-theoretic framework of Russo and Zou (2016) and Xu and Raginsky (2017)
provides bounds on the generalization error of a learning algorithm in terms of the mutual …

被引用次数：116 相关文章所有 5 个版本

[PDF] jmlr.org

The dynamics of sharpness-aware minimization: Bouncing across ravines and drifting towards wide minima

PL Bartlett, PM Long, O Bousquet - Journal of Machine Learning Research, 2023 - jmlr.org

We consider Sharpness-Aware Minimization (SAM), a gradient-based optimization method
for deep networks that has exhibited performance improvements on image and language …

被引用次数：41 相关文章所有 3 个版本

[PDF] mlr.press

On the role of data in PAC-Bayes bounds

GK Dziugaite, K Hsu, W Gharbieh… - International …, 2021 - proceedings.mlr.press

The dominant term in PAC-Bayes bounds is often the Kullback-Leibler divergence between
the posterior and prior. For so-called linear PAC-Bayes risk bounds based on the empirical …

被引用次数：114 相关文章所有 3 个版本

[PDF] thecvf.com

Randomized adversarial training via taylor expansion

G Jin, X Yi, D Wu, R Mu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

In recent years, there has been an explosion of research into developing more robust deep
neural networks against adversarial examples. Adversarial training appears as one of the …

被引用次数：40 相关文章所有 5 个版本

[PDF] mlr.press

Shape matters: Understanding the implicit bias of the noise covariance

JZ HaoChen, C Wei, J Lee… - Conference on Learning …, 2021 - proceedings.mlr.press

The noise in stochastic gradient descent (SGD) provides a crucial implicit regularization
effect for training overparameterized models. Prior theoretical work largely focuses on …

被引用次数：106 相关文章所有 6 个版本