Symmetries, flat minima, and the conserved quantities of gradient flow
Empirical studies of the loss landscape of deep networks have revealed that many local
minima are connected through low-loss valleys. Yet, little is known about the theoretical …
minima are connected through low-loss valleys. Yet, little is known about the theoretical …
Symmetry teleportation for accelerated optimization
Existing gradient-based optimization methods update parameters locally, in a direction that
minimizes the loss function. We study a different approach, symmetry teleportation, that …
minimizes the loss function. We study a different approach, symmetry teleportation, that …
Improving Convergence and Generalization Using Parameter Symmetries
In many neural networks, different values of the parameters may result in the same loss
value. Parameter space symmetries are loss-invariant transformations that change the …
value. Parameter space symmetries are loss-invariant transformations that change the …
A Practical Approach for Employing Tensor Train Decomposition in Edge Devices
Abstract Deep Neural Networks (DNN) have made significant advances in various fields
including speech recognition and image processing. Typically, modern DNNs are both …
including speech recognition and image processing. Typically, modern DNNs are both …
Charting Flat Minima Using the Conserved Quantities of Gradient Flow
Empirical studies have revealed that many minima in the loss landscape of deep learning
are connected and reside on a low-loss valley. We present a general framework for finding …
are connected and reside on a low-loss valley. We present a general framework for finding …
Conic Activation Functions
C Fu, LD Cohen - UniReps: 2nd Edition of the Workshop on Unifying … - openreview.net
Most activation functions operate component-wise, which restricts the equivariance of neural
networks to permutations. We introduce Conic Linear Units (CoLU) and generalize the …
networks to permutations. We introduce Conic Linear Units (CoLU) and generalize the …