Optimization for deep learning: An overview
RY Sun - Journal of the Operations Research Society of China, 2020 - Springer
Optimization is a critical component in deep learning. We think optimization for neural
networks is an interesting topic for theoretical research due to various reasons. First, its …
networks is an interesting topic for theoretical research due to various reasons. First, its …
Piecewise linear neural networks and deep learning
As a powerful modelling method, piecewise linear neural networks (PWLNNs) have proven
successful in various fields, most recently in deep learning. To apply PWLNN methods, both …
successful in various fields, most recently in deep learning. To apply PWLNN methods, both …
On the opportunities and risks of foundation models
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …
Att3d: Amortized text-to-3d object synthesis
Text-to-3D modelling has seen exciting progress by combining generative text-to-image
models with image-to-3D methods like Neural Radiance Fields. DreamFusion recently …
models with image-to-3D methods like Neural Radiance Fields. DreamFusion recently …
Knowledge distillation: A good teacher is patient and consistent
There is a growing discrepancy in computer vision between large-scale models that achieve
state-of-the-art performance and models that are affordable in practical applications. In this …
state-of-the-art performance and models that are affordable in practical applications. In this …
Sophia: A scalable stochastic second-order optimizer for language model pre-training
Given the massive cost of language model pre-training, a non-trivial improvement of the
optimization algorithm would lead to a material reduction on the time and cost of training …
optimization algorithm would lead to a material reduction on the time and cost of training …
Cramming: Training a Language Model on a single GPU in one day.
J Geiping, T Goldstein - International Conference on …, 2023 - proceedings.mlr.press
Recent trends in language modeling have focused on increasing performance through
scaling, and have resulted in an environment where training language models is out of …
scaling, and have resulted in an environment where training language models is out of …
Pyhessian: Neural networks through the lens of the hessian
We present PYHESSIAN, a new scalable framework that enables fast computation of
Hessian (ie, second-order derivative) information for deep neural networks. PYHESSIAN …
Hessian (ie, second-order derivative) information for deep neural networks. PYHESSIAN …
No train no gain: Revisiting efficient training algorithms for transformer-based language models
The computation necessary for training Transformer-based language models has
skyrocketed in recent years. This trend has motivated research on efficient training …
skyrocketed in recent years. This trend has motivated research on efficient training …
Large-scale differentially private BERT
In this work, we study the large-scale pretraining of BERT-Large with differentially private
SGD (DP-SGD). We show that combined with a careful implementation, scaling up the batch …
SGD (DP-SGD). We show that combined with a careful implementation, scaling up the batch …