[HTML][HTML] Deep learning in electron microscopy

JM Ede - Machine Learning: Science and Technology, 2021 - iopscience.iop.org
Deep learning is transforming most areas of science and technology, including electron
microscopy. This review paper offers a practical perspective aimed at developers with …

Multi-task self-supervised learning for robust speech recognition

M Ravanelli, J Zhong, S Pascual… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
Despite the growing interest in unsupervised learning, extracting meaningful knowledge
from unlabelled audio remains an open challenge. To take a step in this direction, we …

An improved analysis of stochastic gradient descent with momentum

Y Liu, Y Gao, W Yin - Advances in Neural Information …, 2020 - proceedings.neurips.cc
SGD with momentum (SGDM) has been widely applied in many machine learning tasks, and
it is often applied with dynamic stepsizes and momentum weights tuned in a stagewise …

Understanding the role of training regimes in continual learning

SI Mirzadeh, M Farajtabar, R Pascanu… - Advances in …, 2020 - proceedings.neurips.cc
Catastrophic forgetting affects the training of neural networks, limiting their ability to learn
multiple tasks sequentially. From the perspective of the well established plasticity-stability …

How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?

J Wu, D Zou, Z Chen, V Braverman, Q Gu… - arXiv preprint arXiv …, 2023 - arxiv.org
Transformers pretrained on diverse tasks exhibit remarkable in-context learning (ICL)
capabilities, enabling them to solve unseen tasks solely based on input contexts without …

On the generalization benefit of noise in stochastic gradient descent

S Smith, E Elsen, S De - International Conference on …, 2020 - proceedings.mlr.press
It has long been argued that minibatch stochastic gradient descent can generalize better
than large batch gradient descent in deep neural networks. However recent papers have …

Closing the generalization gap of adaptive gradient methods in training deep neural networks

J Chen, D Zhou, Y Tang, Z Yang, Y Cao… - arXiv preprint arXiv …, 2018 - arxiv.org
Adaptive gradient methods, which adopt historical gradient information to automatically
adjust the learning rate, despite the nice property of fast convergence, have been observed …

Last-iterate convergence: Zero-sum games and constrained min-max optimization

C Daskalakis, I Panageas - arXiv preprint arXiv:1807.04252, 2018 - arxiv.org
Motivated by applications in Game Theory, Optimization, and Generative Adversarial
Networks, recent work of Daskalakis et al\cite {DISZ17} and follow-up work of Liang and …

Last iterate is slower than averaged iterate in smooth convex-concave saddle point problems

N Golowich, S Pattathil, C Daskalakis… - … on Learning Theory, 2020 - proceedings.mlr.press
In this paper we study the smooth convex-concave saddle point problem. Specifically, we
analyze the last iterate convergence properties of the Extragradient (EG) algorithm. It is well …

Generalized Polyak step size for first order optimization with momentum

X Wang, M Johansson, T Zhang - … Conference on Machine …, 2023 - proceedings.mlr.press
In machine learning applications, it is well known that carefully designed learning rate (step
size) schedules can significantly improve the convergence of commonly used first-order …