- 学术资源搜索

Optimization for deep learning: An overview

RY Sun - Journal of the Operations Research Society of China, 2020 - Springer

Optimization is a critical component in deep learning. We think optimization for neural
networks is an interesting topic for theoretical research due to various reasons. First, its …

被引用次数：169 相关文章所有 7 个版本

[PDF] arxiv.org

Piecewise linear neural networks and deep learning

Q Tao, L Li, X Huang, X Xi, S Wang… - Nature Reviews Methods …, 2022 - nature.com

As a powerful modelling method, piecewise linear neural networks (PWLNNs) have proven
successful in various fields, most recently in deep learning. To apply PWLNN methods, both …

被引用次数：28 相关文章所有 6 个版本

[PDF] arxiv.org

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org

AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

被引用次数：4527 相关文章所有 2 个版本

[PDF] thecvf.com

Att3d: Amortized text-to-3d object synthesis

J Lorraine, K Xie, X Zeng, CH Lin… - Proceedings of the …, 2023 - openaccess.thecvf.com

Text-to-3D modelling has seen exciting progress by combining generative text-to-image
models with image-to-3D methods like Neural Radiance Fields. DreamFusion recently …

被引用次数：68 相关文章所有 6 个版本

[PDF] thecvf.com

Knowledge distillation: A good teacher is patient and consistent

L Beyer, X Zhai, A Royer, L Markeeva… - Proceedings of the …, 2022 - openaccess.thecvf.com

There is a growing discrepancy in computer vision between large-scale models that achieve
state-of-the-art performance and models that are affordable in practical applications. In this …

被引用次数：332 相关文章所有 7 个版本

[PDF] arxiv.org

Sophia: A scalable stochastic second-order optimizer for language model pre-training

H Liu, Z Li, D Hall, P Liang, T Ma - arXiv preprint arXiv:2305.14342, 2023 - arxiv.org

Given the massive cost of language model pre-training, a non-trivial improvement of the
optimization algorithm would lead to a material reduction on the time and cost of training …

被引用次数：121 相关文章所有 4 个版本

[PDF] mlr.press

Cramming: Training a Language Model on a single GPU in one day.

J Geiping, T Goldstein - International Conference on …, 2023 - proceedings.mlr.press

Recent trends in language modeling have focused on increasing performance through
scaling, and have resulted in an environment where training language models is out of …

被引用次数：77 相关文章所有 7 个版本

[PDF] arxiv.org

Pyhessian: Neural networks through the lens of the hessian

Z Yao, A Gholami, K Keutzer… - 2020 IEEE international …, 2020 - ieeexplore.ieee.org

We present PYHESSIAN, a new scalable framework that enables fast computation of
Hessian (ie, second-order derivative) information for deep neural networks. PYHESSIAN …

被引用次数：323 相关文章所有 6 个版本

[PDF] neurips.cc

No train no gain: Revisiting efficient training algorithms for transformer-based language models

J Kaddour, O Key, P Nawrot… - Advances in Neural …, 2024 - proceedings.neurips.cc

The computation necessary for training Transformer-based language models has
skyrocketed in recent years. This trend has motivated research on efficient training …

被引用次数：29 相关文章所有 5 个版本

[PDF] arxiv.org

Large-scale differentially private BERT

R Anil, B Ghazi, V Gupta, R Kumar… - arXiv preprint arXiv …, 2021 - arxiv.org

In this work, we study the large-scale pretraining of BERT-Large with differentially private
SGD (DP-SGD). We show that combined with a careful implementation, scaling up the batch …

被引用次数：139 相关文章所有 5 个版本