Explicit regularization in overparametrized models via noise injection

K Wen, Z Li, T Ma - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Despite extensive studies, the underlying reason as to why overparameterizedneural
networks can generalize remains elusive. Existing theory shows that common stochastic …

被引用次数：24 相关文章所有 6 个版本

[PDF] openreview.net

How Sharpness-Aware Minimization Minimizes Sharpness?

K Wen, T Ma, Z Li - The Eleventh International Conference on …, 2023 - openreview.net

Sharpness-Aware Minimization (SAM) is a highly effective regularization technique for
improving the generalization of deep neural networks for various settings. However, the …

被引用次数：46 相关文章

[PDF] neurips.cc

Gradient descent with linearly correlated noise: Theory and applications to differential privacy

A Koloskova, R McKenna, Z Charles… - Advances in …, 2023 - proceedings.neurips.cc

We study gradient descent under linearly correlated noise. Our work is motivated by recent
practical methods for optimization with differential privacy (DP), such as DP-FTRL, which …

被引用次数：15 相关文章所有 6 个版本

[PDF] arxiv.org

How does sharpness-aware minimization minimize sharpness?

K Wen, T Ma, Z Li - arXiv preprint arXiv:2211.05729, 2022 - arxiv.org

Sharpness-Aware Minimization (SAM) is a highly effective regularization technique for
improving the generalization of deep neural networks for various settings. However, the …

被引用次数：32 相关文章所有 3 个版本

[PDF] univ-angers.fr

Optimized injection of noise in activation functions to improve generalization of neural networks

F Duan, F Chapeau-Blondeau, D Abbott - Chaos, Solitons & Fractals, 2024 - Elsevier

This paper proposes a flexible probabilistic activation function that enhances the training
and operation of artificial neural networks by intentionally injecting noise to gain additional …

被引用次数：8 相关文章所有 7 个版本

[PDF] arxiv.org

Implicit regularization in heavy-ball momentum accelerated stochastic gradient descent

A Ghosh, H Lyu, X Zhang, R Wang - arXiv preprint arXiv:2302.00849, 2023 - arxiv.org

It is well known that the finite step-size ($ h $) in Gradient Descent (GD) implicitly regularizes
solutions to flatter minima. A natural question to ask is" Does the momentum parameter …

被引用次数：22 相关文章所有 4 个版本

[PDF] neurips.cc

On the theoretical properties of noise correlation in stochastic optimization

A Lucchi, F Proske, A Orvieto… - Advances in Neural …, 2022 - proceedings.neurips.cc

Studying the properties of stochastic noise to optimize complex non-convex functions has
been an active area of research in the field of machine learning. Prior work~\citep …

被引用次数：9 相关文章所有 8 个版本

[PDF] arxiv.org

PAC-tuning: Fine-tuning Pretrained Language Models with PAC-driven Perturbed Gradient Descent

G Liu, Z Xue, X Zhang, KM Johnson… - arXiv preprint arXiv …, 2023 - arxiv.org

Fine-tuning pretrained language models (PLMs) for downstream tasks is a large-scale
optimization problem, in which the choice of the training algorithm critically determines how …

被引用次数：5 相关文章所有 4 个版本

[PDF] arxiv.org

GIFT-SW: Gaussian noise injected fine-tuning of salient weights for LLMs

M Zhelnin, V Moskvoretskii, E Shvetsov… - arXiv preprint arXiv …, 2024 - arxiv.org

Parameter Efficient Fine-Tuning (PEFT) methods have gained popularity and democratized
the usage of Large Language Models (LLMs). Recent studies have shown that a small …

被引用次数：2 相关文章所有 3 个版本

[PDF] mlr.press

Why is parameter averaging beneficial in SGD? An objective smoothing perspective

A Nitanda, R Kikuchi, S Maeda… - … Conference on Artificial …, 2024 - proceedings.mlr.press

It is often observed that stochastic gradient descent (SGD) and its variants implicitly select a
solution with good generalization performance; such implicit bias is often characterized in …