Symmetries, flat minima, and the conserved quantities of gradient flow

B Zhao, I Ganev, R Walters, R Yu… - arXiv preprint arXiv …, 2022 - arxiv.org
Empirical studies of the loss landscape of deep networks have revealed that many local
minima are connected through low-loss valleys. Yet, little is known about the theoretical …

Symmetry teleportation for accelerated optimization

B Zhao, N Dehmamy, R Walters… - Advances in neural …, 2022 - proceedings.neurips.cc
Existing gradient-based optimization methods update parameters locally, in a direction that
minimizes the loss function. We study a different approach, symmetry teleportation, that …

Improving Convergence and Generalization Using Parameter Symmetries

B Zhao, RM Gower, R Walters, R Yu - arXiv preprint arXiv:2305.13404, 2023 - arxiv.org
In many neural networks, different values of the parameters may result in the same loss
value. Parameter space symmetries are loss-invariant transformations that change the …

A Practical Approach for Employing Tensor Train Decomposition in Edge Devices

M Kokhazadeh, G Keramidas, V Kelefouras… - International Journal of …, 2024 - Springer
Abstract Deep Neural Networks (DNN) have made significant advances in various fields
including speech recognition and image processing. Typically, modern DNNs are both …

Charting Flat Minima Using the Conserved Quantities of Gradient Flow

B Zhao, I Ganev, R Walters, R Yu… - NeurIPS 2022 Workshop …, 2022 - openreview.net
Empirical studies have revealed that many minima in the loss landscape of deep learning
are connected and reside on a low-loss valley. We present a general framework for finding …

Finding Symmetry in Neural Network Parameter Spaces

B Zhao, N Dehmamy, R Walters, R Yu - openreview.net
Parameter space symmetries, or loss-invariant transformations, are important for
understanding neural networks' loss landscape, training dynamics, and generalization …

Conic Activation Functions

C Fu, LD Cohen - UniReps: 2nd Edition of the Workshop on Unifying … - openreview.net
Most activation functions operate component-wise, which restricts the equivariance of neural
networks to permutations. We introduce Conic Linear Units (CoLU) and generalize the …