Normalization techniques in training dnns: Methodology, analysis and application
Normalization techniques are essential for accelerating the training and improving the
generalization of deep neural networks (DNNs), and have successfully been used in various …
generalization of deep neural networks (DNNs), and have successfully been used in various …
Understanding the generalization benefit of normalization layers: Sharpness reduction
Abstract Normalization layers (eg, Batch Normalization, Layer Normalization) were
introduced to help with optimization difficulties in very deep nets, but they clearly also help …
introduced to help with optimization difficulties in very deep nets, but they clearly also help …
Adamp: Slowing down the slowdown for momentum optimizers on scale-invariant weights
Normalization techniques are a boon for modern deep learning. They let weights converge
more quickly with often better generalization performances. It has been argued that the …
more quickly with often better generalization performances. It has been argued that the …
Fast mixing of stochastic gradient descent with normalization and weight decay
Abstract We prove the Fast Equilibrium Conjecture proposed by Li et al.,(2020), ie,
stochastic gradient descent (SGD) on a scale-invariant loss (eg, using networks with various …
stochastic gradient descent (SGD) on a scale-invariant loss (eg, using networks with various …
Fed-CO: Cooperation of Online and Offline Models for Severe Data Heterogeneity in Federated Learning
Federated Learning (FL) has emerged as a promising distributed learning paradigm that
enables multiple clients to learn a global model collaboratively without sharing their private …
enables multiple clients to learn a global model collaboratively without sharing their private …
The implicit bias of batch normalization in linear models and two-layer linear convolutional neural networks
We study the implicit bias of batch normalization trained by gradient descent. We show that
when learning a linear model with batch normalization for binary classification, gradient …
when learning a linear model with batch normalization for binary classification, gradient …
Batch Normalization Alleviates the Spectral Bias in Coordinate Networks
Z Cai, H Zhu, Q Shen, X Wang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Representing signals using coordinate networks dominates the area of inverse problems
recently and is widely applied in various scientific computing tasks. Still there exists an issue …
recently and is widely applied in various scientific computing tasks. Still there exists an issue …
Neural tangent kernel empowered federated learning
Federated learning (FL) is a privacy-preserving paradigm where multiple participants jointly
solve a machine learning problem without sharing raw data. Unlike traditional distributed …
solve a machine learning problem without sharing raw data. Unlike traditional distributed …
A modular analysis of provable acceleration via polyak's momentum: Training a wide relu network and a deep linear network
Incorporating a so-called “momentum” dynamic in gradient descent methods is widely used
in neural net training as it has been broadly observed that, at least empirically, it often leads …
in neural net training as it has been broadly observed that, at least empirically, it often leads …
Geometry of linear convolutional networks
We study the family of functions that are represented by a linear convolutional network
(LCN). These functions form a semi-algebraic subset of the set of linear maps from input …
(LCN). These functions form a semi-algebraic subset of the set of linear maps from input …