[HTML][HTML] Applications and techniques for fast machine learning in science

AMC Deiana, N Tran, J Agar, M Blott… - Frontiers in big …, 2022 - frontiersin.org
In this community review report, we discuss applications and techniques for fast machine
learning (ML) in science—the concept of integrating powerful ML methods into the real-time …

Symbolic discovery of optimization algorithms

X Chen, C Liang, D Huang, E Real… - Advances in neural …, 2024 - proceedings.neurips.cc
We present a method to formulate algorithm discovery as program search, and apply it to
discover optimization algorithms for deep neural network training. We leverage efficient …

[HTML][HTML] Survey of optimization algorithms in modern neural networks

R Abdulkadirov, P Lyakhov, N Nagornov - Mathematics, 2023 - mdpi.com
The main goal of machine learning is the creation of self-learning algorithms in many areas
of human activity. It allows a replacement of a person with artificial intelligence in seeking to …

On neural differential equations

P Kidger - arXiv preprint arXiv:2202.02435, 2022 - arxiv.org
The conjoining of dynamical systems and deep learning has become a topic of great
interest. In particular, neural differential equations (NDEs) demonstrate that neural networks …

[HTML][HTML] A modified Adam algorithm for deep neural network optimization

M Reyad, AM Sarhan, M Arafa - Neural Computing and Applications, 2023 - Springer
Abstract Deep Neural Networks (DNNs) are widely regarded as the most effective learning
tool for dealing with large datasets, and they have been successfully used in thousands of …

Sophia: A scalable stochastic second-order optimizer for language model pre-training

H Liu, Z Li, D Hall, P Liang, T Ma - arXiv preprint arXiv:2305.14342, 2023 - arxiv.org
Given the massive cost of language model pre-training, a non-trivial improvement of the
optimization algorithm would lead to a material reduction on the time and cost of training …

Efficient dataset distillation using random feature approximation

N Loo, R Hasani, A Amini… - Advances in Neural …, 2022 - proceedings.neurips.cc
Dataset distillation compresses large datasets into smaller synthetic coresets which retain
performance with the aim of reducing the storage and computational burden of processing …

[PDF][PDF] Multimodal fusion with co-attention networks for fake news detection

Y Wu, P Zhan, Y Zhang, L Wang… - Findings of the association …, 2021 - aclanthology.org
Fake news with textual and visual contents has a better story-telling ability than text-only
contents, and can be spread quickly with social media. People can be easily deceived by …

Surrogate gap minimization improves sharpness-aware training

J Zhuang, B Gong, L Yuan, Y Cui, H Adam… - arXiv preprint arXiv …, 2022 - arxiv.org
The recently proposed Sharpness-Aware Minimization (SAM) improves generalization by
minimizing a\textit {perturbed loss} defined as the maximum loss within a neighborhood in …

Adan: Adaptive nesterov momentum algorithm for faster optimizing deep models

X Xie, P Zhou, H Li, Z Lin, S Yan - arXiv preprint arXiv:2208.06677, 2022 - arxiv.org
In deep learning, different kinds of deep networks typically need different optimizers, which
have to be chosen after multiple trials, making the training process inefficient. To relieve this …