[HTML][HTML] Applications and techniques for fast machine learning in science
In this community review report, we discuss applications and techniques for fast machine
learning (ML) in science—the concept of integrating powerful ML methods into the real-time …
learning (ML) in science—the concept of integrating powerful ML methods into the real-time …
Symbolic discovery of optimization algorithms
We present a method to formulate algorithm discovery as program search, and apply it to
discover optimization algorithms for deep neural network training. We leverage efficient …
discover optimization algorithms for deep neural network training. We leverage efficient …
[HTML][HTML] Survey of optimization algorithms in modern neural networks
The main goal of machine learning is the creation of self-learning algorithms in many areas
of human activity. It allows a replacement of a person with artificial intelligence in seeking to …
of human activity. It allows a replacement of a person with artificial intelligence in seeking to …
On neural differential equations
P Kidger - arXiv preprint arXiv:2202.02435, 2022 - arxiv.org
The conjoining of dynamical systems and deep learning has become a topic of great
interest. In particular, neural differential equations (NDEs) demonstrate that neural networks …
interest. In particular, neural differential equations (NDEs) demonstrate that neural networks …
[HTML][HTML] A modified Adam algorithm for deep neural network optimization
Abstract Deep Neural Networks (DNNs) are widely regarded as the most effective learning
tool for dealing with large datasets, and they have been successfully used in thousands of …
tool for dealing with large datasets, and they have been successfully used in thousands of …
Sophia: A scalable stochastic second-order optimizer for language model pre-training
Given the massive cost of language model pre-training, a non-trivial improvement of the
optimization algorithm would lead to a material reduction on the time and cost of training …
optimization algorithm would lead to a material reduction on the time and cost of training …
Efficient dataset distillation using random feature approximation
Dataset distillation compresses large datasets into smaller synthetic coresets which retain
performance with the aim of reducing the storage and computational burden of processing …
performance with the aim of reducing the storage and computational burden of processing …
[PDF][PDF] Multimodal fusion with co-attention networks for fake news detection
Y Wu, P Zhan, Y Zhang, L Wang… - Findings of the association …, 2021 - aclanthology.org
Fake news with textual and visual contents has a better story-telling ability than text-only
contents, and can be spread quickly with social media. People can be easily deceived by …
contents, and can be spread quickly with social media. People can be easily deceived by …
Surrogate gap minimization improves sharpness-aware training
The recently proposed Sharpness-Aware Minimization (SAM) improves generalization by
minimizing a\textit {perturbed loss} defined as the maximum loss within a neighborhood in …
minimizing a\textit {perturbed loss} defined as the maximum loss within a neighborhood in …
Adan: Adaptive nesterov momentum algorithm for faster optimizing deep models
In deep learning, different kinds of deep networks typically need different optimizers, which
have to be chosen after multiple trials, making the training process inefficient. To relieve this …
have to be chosen after multiple trials, making the training process inefficient. To relieve this …