Optimization theory for relu neural networks trained with normalization layers

L Huang, J Qin, Y Zhou, F Zhu, L Liu… - IEEE transactions on …, 2023 - ieeexplore.ieee.org

Normalization techniques are essential for accelerating the training and improving the
generalization of deep neural networks (DNNs), and have successfully been used in various …

被引用次数：312 相关文章所有 8 个版本

[PDF] neurips.cc

Understanding the generalization benefit of normalization layers: Sharpness reduction

K Lyu, Z Li, S Arora - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Abstract Normalization layers (eg, Batch Normalization, Layer Normalization) were
introduced to help with optimization difficulties in very deep nets, but they clearly also help …

被引用次数：75 相关文章所有 8 个版本

[PDF] arxiv.org

Adamp: Slowing down the slowdown for momentum optimizers on scale-invariant weights

B Heo, S Chun, SJ Oh, D Han, S Yun, G Kim… - arXiv preprint arXiv …, 2020 - arxiv.org

Normalization techniques are a boon for modern deep learning. They let weights converge
more quickly with often better generalization performances. It has been argued that the …

被引用次数：170 相关文章所有 9 个版本

[PDF] neurips.cc

Fast mixing of stochastic gradient descent with normalization and weight decay

Z Li, T Wang, D Yu - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Abstract We prove the Fast Equilibrium Conjecture proposed by Li et al.,(2020), ie,
stochastic gradient descent (SGD) on a scale-invariant loss (eg, using networks with various …

被引用次数：19 相关文章所有 4 个版本

[PDF] neurips.cc

Fed-CO: Cooperation of Online and Offline Models for Severe Data Heterogeneity in Federated Learning

Z Cai, Y Shi, W Huang, J Wang - Advances in Neural …, 2024 - proceedings.neurips.cc

Federated Learning (FL) has emerged as a promising distributed learning paradigm that
enables multiple clients to learn a global model collaboratively without sharing their private …

被引用次数：5 相关文章所有 8 个版本

[PDF] mlr.press

The implicit bias of batch normalization in linear models and two-layer linear convolutional neural networks

Y Cao, D Zou, Y Li, Q Gu - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

We study the implicit bias of batch normalization trained by gradient descent. We show that
when learning a linear model with batch normalization for binary classification, gradient …

被引用次数：7 相关文章所有 5 个版本

[PDF] thecvf.com

Batch Normalization Alleviates the Spectral Bias in Coordinate Networks

Z Cai, H Zhu, Q Shen, X Wang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Representing signals using coordinate networks dominates the area of inverse problems
recently and is widely applied in various scientific computing tasks. Still there exists an issue …

被引用次数：3 相关文章

[PDF] mlr.press

Neural tangent kernel empowered federated learning

K Yue, R Jin, R Pilgrim, CW Wong… - International …, 2022 - proceedings.mlr.press

Federated learning (FL) is a privacy-preserving paradigm where multiple participants jointly
solve a machine learning problem without sharing raw data. Unlike traditional distributed …

被引用次数：21 相关文章所有 8 个版本

[PDF] mlr.press

A modular analysis of provable acceleration via polyak's momentum: Training a wide relu network and a deep linear network

JK Wang, CH Lin, JD Abernethy - … Conference on Machine …, 2021 - proceedings.mlr.press

Incorporating a so-called “momentum” dynamic in gradient descent methods is widely used
in neural net training as it has been broadly observed that, at least empirically, it often leads …

被引用次数：27 相关文章所有 5 个版本

[PDF] siam.org

Geometry of linear convolutional networks

K Kohn, T Merkh, G Montúfar, M Trager - SIAM Journal on Applied Algebra and …, 2022 - SIAM

We study the family of functions that are represented by a linear convolutional network
(LCN). These functions form a semi-algebraic subset of the set of linear maps from input …

被引用次数：20 相关文章所有 9 个版本