Solving linear programs in the current matrix multiplication time

MB Cohen, YT Lee, Z Song - Journal of the ACM (JACM), 2021 - dl.acm.org
This article shows how to solve linear programs of the form min Ax= b, x≥ 0 c⊤ x with n
variables in time O*((n ω+ n 2.5− α/2+ n 2+ 1/6) log (n/δ)), where ω is the exponent of matrix …

Attention scheme inspired softmax regression

Y Deng, Z Li, Z Song - arXiv preprint arXiv:2304.10411, 2023 - arxiv.org
Large language models (LLMs) have made transformed changes for human society. One of
the key computation in LLMs is the softmax unit. This operation is important in LLMs …

Improved architectures and training algorithms for deep operator networks

S Wang, H Wang, P Perdikaris - Journal of Scientific Computing, 2022 - Springer
Operator learning techniques have recently emerged as a powerful tool for learning maps
between infinite-dimensional Banach spaces. Trained under appropriate constraints, they …

How to Protect Copyright Data in Optimization of Large Language Models?

T Chu, Z Song, C Yang - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org
The softmax operator is a crucial component of large language models (LLMs), which have
played a transformative role in computer research. Due to the centrality of the softmax …

Pixelated butterfly: Simple and efficient sparse training for neural network models

T Dao, B Chen, K Liang, J Yang, Z Song… - arXiv preprint arXiv …, 2021 - arxiv.org
Overparameterized neural networks generalize well but are expensive to train. Ideally, one
would like to reduce their computational cost while retaining their generalization benefits …

Bypass exponential time preprocessing: Fast neural network training via weight-data correlation preprocessing

J Alman, Z Song, R Zhang… - Advances in Neural …, 2024 - proceedings.neurips.cc
Over the last decade, deep neural networks have transformed our society, and they are
already widely applied in various machine learning applications. State-of-the-art deep …

Does preprocessing help training over-parameterized neural networks?

Z Song, S Yang, R Zhang - Advances in Neural Information …, 2021 - proceedings.neurips.cc
Deep neural networks have achieved impressive performance in many areas. Designing a
fast and provable method for training neural networks is a fundamental question in machine …

Training multi-layer over-parametrized neural network in subquadratic time

Z Song, L Zhang, R Zhang - arXiv preprint arXiv:2112.07628, 2021 - arxiv.org
We consider the problem of training a multi-layer over-parametrized neural network to
minimize the empirical risk induced by a loss function. In the typical setting of over …

Deep equals shallow for ReLU networks in kernel regimes

A Bietti, F Bach - arXiv preprint arXiv:2009.14397, 2020 - arxiv.org
Deep networks are often considered to be more expressive than shallow ones in terms of
approximation. Indeed, certain functions can be approximated by deep networks provably …

An over-parameterized exponential regression

Y Gao, S Mahadevan, Z Song - arXiv preprint arXiv:2303.16504, 2023 - arxiv.org
Over the past few years, there has been a significant amount of research focused on
studying the ReLU activation function, with the aim of achieving neural network convergence …