Unraveling attention via convex duality: Analysis and interpretations of vision transformers

A Sahiner, T Ergen, B Ozturkler… - International …, 2022 - proceedings.mlr.press
Vision transformers using self-attention or its proposed alternatives have demonstrated
promising results in many image related tasks. However, the underpinning inductive bias of …

Fast convex optimization for two-layer relu networks: Equivalent model classes and cone decompositions

A Mishkin, A Sahiner, M Pilanci - … Conference on Machine …, 2022 - proceedings.mlr.press
We develop fast algorithms and robust software for convex optimization of two-layer neural
networks with ReLU activation functions. Our work leverages a convex re-formulation of the …

Global optimality beyond two layers: Training deep relu networks via convex programs

T Ergen, M Pilanci - International Conference on Machine …, 2021 - proceedings.mlr.press
Understanding the fundamental mechanism behind the success of deep neural networks is
one of the key challenges in the modern machine learning literature. Despite numerous …

Vector-output relu neural network problems are copositive programs: Convex analysis of two layer networks and polynomial-time algorithms

A Sahiner, T Ergen, J Pauly, M Pilanci - arXiv preprint arXiv:2012.13329, 2020 - arxiv.org
We describe the convex semi-infinite dual of the two-layer vector-output ReLU neural
network training problem. This semi-infinite dual admits a finite dimensional representation …

Demystifying batch normalization in relu networks: Equivalent convex optimization models and implicit regularization

T Ergen, A Sahiner, B Ozturkler, J Pauly… - arXiv preprint arXiv …, 2021 - arxiv.org
Batch Normalization (BN) is a commonly used technique to accelerate and stabilize training
of deep neural networks. Despite its empirical success, a full theoretical understanding of …

A neural tangent kernel perspective of GANs

JY Franceschi, E De Bézenac, I Ayed… - International …, 2022 - proceedings.mlr.press
We propose a novel theoretical framework of analysis for Generative Adversarial Networks
(GANs). We reveal a fundamental flaw of previous analyses which, by incorrectly modeling …

Path regularization: A convexity and sparsity inducing regularization for parallel relu networks

T Ergen, M Pilanci - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Understanding the fundamental principles behind the success of deep neural networks is
one of the most important open questions in the current literature. To this end, we study the …

Globally optimal training of neural networks with threshold activation functions

T Ergen, HI Gulluk, J Lacotte, M Pilanci - arXiv preprint arXiv:2303.03382, 2023 - arxiv.org
Threshold activation functions are highly preferable in neural networks due to their efficiency
in hardware implementations. Moreover, their mode of operation is more interpretable and …

Fixing the NTK: from neural network linearizations to exact convex programs

RV Dwaraknath, T Ergen… - Advances in Neural …, 2024 - proceedings.neurips.cc
Recently, theoretical analyses of deep neural networks have broadly focused on two
directions: 1) Providing insight into neural network training by SGD in the limit of infinite …

Parallel deep neural networks have zero duality gap

Y Wang, T Ergen, M Pilanci - arXiv preprint arXiv:2110.06482, 2021 - arxiv.org
Training deep neural networks is a challenging non-convex optimization problem. Recent
work has proven that the strong duality holds (which means zero duality gap) for regularized …