Normalization techniques in training dnns: Methodology, analysis and application

L Huang, J Qin, Y Zhou, F Zhu, L Liu… - IEEE transactions on …, 2023 - ieeexplore.ieee.org
Normalization techniques are essential for accelerating the training and improving the
generalization of deep neural networks (DNNs), and have successfully been used in various …

Understanding the generalization benefit of normalization layers: Sharpness reduction

K Lyu, Z Li, S Arora - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Abstract Normalization layers (eg, Batch Normalization, Layer Normalization) were
introduced to help with optimization difficulties in very deep nets, but they clearly also help …

Adamp: Slowing down the slowdown for momentum optimizers on scale-invariant weights

B Heo, S Chun, SJ Oh, D Han, S Yun, G Kim… - arXiv preprint arXiv …, 2020 - arxiv.org
Normalization techniques are a boon for modern deep learning. They let weights converge
more quickly with often better generalization performances. It has been argued that the …

Fast mixing of stochastic gradient descent with normalization and weight decay

Z Li, T Wang, D Yu - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Abstract We prove the Fast Equilibrium Conjecture proposed by Li et al.,(2020), ie,
stochastic gradient descent (SGD) on a scale-invariant loss (eg, using networks with various …

Fed-CO: Cooperation of Online and Offline Models for Severe Data Heterogeneity in Federated Learning

Z Cai, Y Shi, W Huang, J Wang - Advances in Neural …, 2024 - proceedings.neurips.cc
Federated Learning (FL) has emerged as a promising distributed learning paradigm that
enables multiple clients to learn a global model collaboratively without sharing their private …

The implicit bias of batch normalization in linear models and two-layer linear convolutional neural networks

Y Cao, D Zou, Y Li, Q Gu - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We study the implicit bias of batch normalization trained by gradient descent. We show that
when learning a linear model with batch normalization for binary classification, gradient …

Batch Normalization Alleviates the Spectral Bias in Coordinate Networks

Z Cai, H Zhu, Q Shen, X Wang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Representing signals using coordinate networks dominates the area of inverse problems
recently and is widely applied in various scientific computing tasks. Still there exists an issue …

Neural tangent kernel empowered federated learning

K Yue, R Jin, R Pilgrim, CW Wong… - International …, 2022 - proceedings.mlr.press
Federated learning (FL) is a privacy-preserving paradigm where multiple participants jointly
solve a machine learning problem without sharing raw data. Unlike traditional distributed …

A modular analysis of provable acceleration via polyak's momentum: Training a wide relu network and a deep linear network

JK Wang, CH Lin, JD Abernethy - … Conference on Machine …, 2021 - proceedings.mlr.press
Incorporating a so-called “momentum” dynamic in gradient descent methods is widely used
in neural net training as it has been broadly observed that, at least empirically, it often leads …

Geometry of linear convolutional networks

K Kohn, T Merkh, G Montúfar, M Trager - SIAM Journal on Applied Algebra and …, 2022 - SIAM
We study the family of functions that are represented by a linear convolutional network
(LCN). These functions form a semi-algebraic subset of the set of linear maps from input …