Solving linear programs in the current matrix multiplication time
This article shows how to solve linear programs of the form min Ax= b, x≥ 0 c⊤ x with n
variables in time O*((n ω+ n 2.5− α/2+ n 2+ 1/6) log (n/δ)), where ω is the exponent of matrix …
variables in time O*((n ω+ n 2.5− α/2+ n 2+ 1/6) log (n/δ)), where ω is the exponent of matrix …
Attention scheme inspired softmax regression
Large language models (LLMs) have made transformed changes for human society. One of
the key computation in LLMs is the softmax unit. This operation is important in LLMs …
the key computation in LLMs is the softmax unit. This operation is important in LLMs …
Improved architectures and training algorithms for deep operator networks
Operator learning techniques have recently emerged as a powerful tool for learning maps
between infinite-dimensional Banach spaces. Trained under appropriate constraints, they …
between infinite-dimensional Banach spaces. Trained under appropriate constraints, they …
How to Protect Copyright Data in Optimization of Large Language Models?
The softmax operator is a crucial component of large language models (LLMs), which have
played a transformative role in computer research. Due to the centrality of the softmax …
played a transformative role in computer research. Due to the centrality of the softmax …
Pixelated butterfly: Simple and efficient sparse training for neural network models
Overparameterized neural networks generalize well but are expensive to train. Ideally, one
would like to reduce their computational cost while retaining their generalization benefits …
would like to reduce their computational cost while retaining their generalization benefits …
Bypass exponential time preprocessing: Fast neural network training via weight-data correlation preprocessing
Over the last decade, deep neural networks have transformed our society, and they are
already widely applied in various machine learning applications. State-of-the-art deep …
already widely applied in various machine learning applications. State-of-the-art deep …
Does preprocessing help training over-parameterized neural networks?
Deep neural networks have achieved impressive performance in many areas. Designing a
fast and provable method for training neural networks is a fundamental question in machine …
fast and provable method for training neural networks is a fundamental question in machine …
Training multi-layer over-parametrized neural network in subquadratic time
We consider the problem of training a multi-layer over-parametrized neural network to
minimize the empirical risk induced by a loss function. In the typical setting of over …
minimize the empirical risk induced by a loss function. In the typical setting of over …
Deep equals shallow for ReLU networks in kernel regimes
Deep networks are often considered to be more expressive than shallow ones in terms of
approximation. Indeed, certain functions can be approximated by deep networks provably …
approximation. Indeed, certain functions can be approximated by deep networks provably …
An over-parameterized exponential regression
Over the past few years, there has been a significant amount of research focused on
studying the ReLU activation function, with the aim of achieving neural network convergence …
studying the ReLU activation function, with the aim of achieving neural network convergence …