Unraveling attention via convex duality: Analysis and interpretations of vision transformers
Vision transformers using self-attention or its proposed alternatives have demonstrated
promising results in many image related tasks. However, the underpinning inductive bias of …
promising results in many image related tasks. However, the underpinning inductive bias of …
Fast convex optimization for two-layer relu networks: Equivalent model classes and cone decompositions
We develop fast algorithms and robust software for convex optimization of two-layer neural
networks with ReLU activation functions. Our work leverages a convex re-formulation of the …
networks with ReLU activation functions. Our work leverages a convex re-formulation of the …
Global optimality beyond two layers: Training deep relu networks via convex programs
Understanding the fundamental mechanism behind the success of deep neural networks is
one of the key challenges in the modern machine learning literature. Despite numerous …
one of the key challenges in the modern machine learning literature. Despite numerous …
Vector-output relu neural network problems are copositive programs: Convex analysis of two layer networks and polynomial-time algorithms
We describe the convex semi-infinite dual of the two-layer vector-output ReLU neural
network training problem. This semi-infinite dual admits a finite dimensional representation …
network training problem. This semi-infinite dual admits a finite dimensional representation …
Demystifying batch normalization in relu networks: Equivalent convex optimization models and implicit regularization
Batch Normalization (BN) is a commonly used technique to accelerate and stabilize training
of deep neural networks. Despite its empirical success, a full theoretical understanding of …
of deep neural networks. Despite its empirical success, a full theoretical understanding of …
A neural tangent kernel perspective of GANs
We propose a novel theoretical framework of analysis for Generative Adversarial Networks
(GANs). We reveal a fundamental flaw of previous analyses which, by incorrectly modeling …
(GANs). We reveal a fundamental flaw of previous analyses which, by incorrectly modeling …
Path regularization: A convexity and sparsity inducing regularization for parallel relu networks
Understanding the fundamental principles behind the success of deep neural networks is
one of the most important open questions in the current literature. To this end, we study the …
one of the most important open questions in the current literature. To this end, we study the …
Globally optimal training of neural networks with threshold activation functions
Threshold activation functions are highly preferable in neural networks due to their efficiency
in hardware implementations. Moreover, their mode of operation is more interpretable and …
in hardware implementations. Moreover, their mode of operation is more interpretable and …
Fixing the NTK: from neural network linearizations to exact convex programs
RV Dwaraknath, T Ergen… - Advances in Neural …, 2024 - proceedings.neurips.cc
Recently, theoretical analyses of deep neural networks have broadly focused on two
directions: 1) Providing insight into neural network training by SGD in the limit of infinite …
directions: 1) Providing insight into neural network training by SGD in the limit of infinite …
Parallel deep neural networks have zero duality gap
Training deep neural networks is a challenging non-convex optimization problem. Recent
work has proven that the strong duality holds (which means zero duality gap) for regularized …
work has proven that the strong duality holds (which means zero duality gap) for regularized …