Training (overparametrized) neural networks in near-linear time

MB Cohen, YT Lee, Z Song - Journal of the ACM (JACM), 2021 - dl.acm.org

This article shows how to solve linear programs of the form min Ax= b, x≥ 0 c⊤ x with n
variables in time O*((n ω+ n 2.5− α/2+ n 2+ 1/6) log (n/δ)), where ω is the exponent of matrix …

被引用次数：427 相关文章所有 4 个版本

[PDF] arxiv.org

Attention scheme inspired softmax regression

Y Deng, Z Li, Z Song - arXiv preprint arXiv:2304.10411, 2023 - arxiv.org

Large language models (LLMs) have made transformed changes for human society. One of
the key computation in LLMs is the softmax unit. This operation is important in LLMs …

被引用次数：42 相关文章所有 2 个版本

[PDF] arxiv.org

Improved architectures and training algorithms for deep operator networks

S Wang, H Wang, P Perdikaris - Journal of Scientific Computing, 2022 - Springer

Operator learning techniques have recently emerged as a powerful tool for learning maps
between infinite-dimensional Banach spaces. Trained under appropriate constraints, they …

被引用次数：99 相关文章所有 9 个版本

[PDF] aaai.org

How to Protect Copyright Data in Optimization of Large Language Models?

T Chu, Z Song, C Yang - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org

The softmax operator is a crucial component of large language models (LLMs), which have
played a transformative role in computer research. Due to the centrality of the softmax …

被引用次数：28 相关文章所有 3 个版本

[PDF] arxiv.org

Pixelated butterfly: Simple and efficient sparse training for neural network models

T Dao, B Chen, K Liang, J Yang, Z Song… - arXiv preprint arXiv …, 2021 - arxiv.org

Overparameterized neural networks generalize well but are expensive to train. Ideally, one
would like to reduce their computational cost while retaining their generalization benefits …

被引用次数：76 相关文章所有 6 个版本

[PDF] neurips.cc

Bypass exponential time preprocessing: Fast neural network training via weight-data correlation preprocessing

J Alman, Z Song, R Zhang… - Advances in Neural …, 2024 - proceedings.neurips.cc

Over the last decade, deep neural networks have transformed our society, and they are
already widely applied in various machine learning applications. State-of-the-art deep …

被引用次数：28 相关文章所有 5 个版本

[PDF] neurips.cc

Does preprocessing help training over-parameterized neural networks?

Z Song, S Yang, R Zhang - Advances in Neural Information …, 2021 - proceedings.neurips.cc

Deep neural networks have achieved impressive performance in many areas. Designing a
fast and provable method for training neural networks is a fundamental question in machine …

被引用次数：52 相关文章所有 7 个版本

[PDF] arxiv.org

Training multi-layer over-parametrized neural network in subquadratic time

Z Song, L Zhang, R Zhang - arXiv preprint arXiv:2112.07628, 2021 - arxiv.org

We consider the problem of training a multi-layer over-parametrized neural network to
minimize the empirical risk induced by a loss function. In the typical setting of over …

被引用次数：62 相关文章所有 6 个版本

[PDF] arxiv.org

Deep equals shallow for ReLU networks in kernel regimes

A Bietti, F Bach - arXiv preprint arXiv:2009.14397, 2020 - arxiv.org

Deep networks are often considered to be more expressive than shallow ones in terms of
approximation. Indeed, certain functions can be approximated by deep networks provably …

被引用次数：88 相关文章所有 11 个版本

[PDF] arxiv.org

An over-parameterized exponential regression

Y Gao, S Mahadevan, Z Song - arXiv preprint arXiv:2303.16504, 2023 - arxiv.org

Over the past few years, there has been a significant amount of research focused on
studying the ReLU activation function, with the aim of achieving neural network convergence …

被引用次数：30 相关文章所有 4 个版本