Adabelief optimizer: Adapting stepsizes by the belief in observed gradients

AMC Deiana, N Tran, J Agar, M Blott… - Frontiers in big …, 2022 - frontiersin.org

In this community review report, we discuss applications and techniques for fast machine
learning (ML) in science—the concept of integrating powerful ML methods into the real-time …

被引用次数：53 相关文章所有 27 个版本

[PDF] neurips.cc

Symbolic discovery of optimization algorithms

X Chen, C Liang, D Huang, E Real… - Advances in neural …, 2024 - proceedings.neurips.cc

We present a method to formulate algorithm discovery as program search, and apply it to
discover optimization algorithms for deep neural network training. We leverage efficient …

被引用次数：251 相关文章所有 7 个版本

[HTML] mdpi.com

[HTML][HTML] Survey of optimization algorithms in modern neural networks

R Abdulkadirov, P Lyakhov, N Nagornov - Mathematics, 2023 - mdpi.com

The main goal of machine learning is the creation of self-learning algorithms in many areas
of human activity. It allows a replacement of a person with artificial intelligence in seeking to …

被引用次数：20 相关文章所有 8 个版本

[PDF] arxiv.org

On neural differential equations

P Kidger - arXiv preprint arXiv:2202.02435, 2022 - arxiv.org

The conjoining of dynamical systems and deep learning has become a topic of great
interest. In particular, neural differential equations (NDEs) demonstrate that neural networks …

被引用次数：211 相关文章所有 4 个版本

[HTML] springer.com

[HTML][HTML] A modified Adam algorithm for deep neural network optimization

M Reyad, AM Sarhan, M Arafa - Neural Computing and Applications, 2023 - Springer

Abstract Deep Neural Networks (DNNs) are widely regarded as the most effective learning
tool for dealing with large datasets, and they have been successfully used in thousands of …

被引用次数：67 相关文章所有 4 个版本

[PDF] arxiv.org

Sophia: A scalable stochastic second-order optimizer for language model pre-training

H Liu, Z Li, D Hall, P Liang, T Ma - arXiv preprint arXiv:2305.14342, 2023 - arxiv.org

Given the massive cost of language model pre-training, a non-trivial improvement of the
optimization algorithm would lead to a material reduction on the time and cost of training …

被引用次数：70 相关文章所有 4 个版本

[PDF] neurips.cc

Efficient dataset distillation using random feature approximation

N Loo, R Hasani, A Amini… - Advances in Neural …, 2022 - proceedings.neurips.cc

Dataset distillation compresses large datasets into smaller synthetic coresets which retain
performance with the aim of reducing the storage and computational burden of processing …

被引用次数：64 相关文章所有 5 个版本

[PDF] aclanthology.org

[PDF][PDF] Multimodal fusion with co-attention networks for fake news detection

Y Wu, P Zhan, Y Zhang, L Wang… - Findings of the association …, 2021 - aclanthology.org

Fake news with textual and visual contents has a better story-telling ability than text-only
contents, and can be spread quickly with social media. People can be easily deceived by …

被引用次数：156 相关文章所有 3 个版本

[PDF] arxiv.org

Surrogate gap minimization improves sharpness-aware training

J Zhuang, B Gong, L Yuan, Y Cui, H Adam… - arXiv preprint arXiv …, 2022 - arxiv.org

The recently proposed Sharpness-Aware Minimization (SAM) improves generalization by
minimizing a\textit {perturbed loss} defined as the maximum loss within a neighborhood in …

被引用次数：122 相关文章所有 4 个版本

[PDF] arxiv.org

Adan: Adaptive nesterov momentum algorithm for faster optimizing deep models

X Xie, P Zhou, H Li, Z Lin, S Yan - arXiv preprint arXiv:2208.06677, 2022 - arxiv.org

In deep learning, different kinds of deep networks typically need different optimizers, which
have to be chosen after multiple trials, making the training process inefficient. To relieve this …

被引用次数：85 相关文章所有 4 个版本