- 学术资源搜索

[HTML][HTML] Deep learning in electron microscopy

JM Ede - Machine Learning: Science and Technology, 2021 - iopscience.iop.org

Deep learning is transforming most areas of science and technology, including electron
microscopy. This review paper offers a practical perspective aimed at developers with …

被引用次数：104 相关文章所有 8 个版本

[PDF] arxiv.org

Multi-task self-supervised learning for robust speech recognition

M Ravanelli, J Zhong, S Pascual… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

Despite the growing interest in unsupervised learning, extracting meaningful knowledge
from unlabelled audio remains an open challenge. To take a step in this direction, we …

被引用次数：305 相关文章所有 7 个版本

[PDF] neurips.cc

An improved analysis of stochastic gradient descent with momentum

Y Liu, Y Gao, W Yin - Advances in Neural Information …, 2020 - proceedings.neurips.cc

SGD with momentum (SGDM) has been widely applied in many machine learning tasks, and
it is often applied with dynamic stepsizes and momentum weights tuned in a stagewise …

被引用次数：204 相关文章所有 8 个版本

[PDF] neurips.cc

Understanding the role of training regimes in continual learning

SI Mirzadeh, M Farajtabar, R Pascanu… - Advances in …, 2020 - proceedings.neurips.cc

Catastrophic forgetting affects the training of neural networks, limiting their ability to learn
multiple tasks sequentially. From the perspective of the well established plasticity-stability …

被引用次数：194 相关文章所有 9 个版本

[PDF] arxiv.org

How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?

J Wu, D Zou, Z Chen, V Braverman, Q Gu… - arXiv preprint arXiv …, 2023 - arxiv.org

Transformers pretrained on diverse tasks exhibit remarkable in-context learning (ICL)
capabilities, enabling them to solve unseen tasks solely based on input contexts without …

被引用次数：29 相关文章所有 5 个版本

[PDF] mlr.press

On the generalization benefit of noise in stochastic gradient descent

S Smith, E Elsen, S De - International Conference on …, 2020 - proceedings.mlr.press

It has long been argued that minibatch stochastic gradient descent can generalize better
than large batch gradient descent in deep neural networks. However recent papers have …

被引用次数：98 相关文章所有 7 个版本

[PDF] nsf.gov

Closing the generalization gap of adaptive gradient methods in training deep neural networks

J Chen, D Zhou, Y Tang, Z Yang, Y Cao… - arXiv preprint arXiv …, 2018 - arxiv.org

Adaptive gradient methods, which adopt historical gradient information to automatically
adjust the learning rate, despite the nice property of fast convergence, have been observed …

被引用次数：199 相关文章所有 9 个版本

[PDF] arxiv.org

Last-iterate convergence: Zero-sum games and constrained min-max optimization

C Daskalakis, I Panageas - arXiv preprint arXiv:1807.04252, 2018 - arxiv.org

Motivated by applications in Game Theory, Optimization, and Generative Adversarial
Networks, recent work of Daskalakis et al\cite {DISZ17} and follow-up work of Liang and …

被引用次数：170 相关文章所有 10 个版本

[PDF] mlr.press

Last iterate is slower than averaged iterate in smooth convex-concave saddle point problems

N Golowich, S Pattathil, C Daskalakis… - … on Learning Theory, 2020 - proceedings.mlr.press

In this paper we study the smooth convex-concave saddle point problem. Specifically, we
analyze the last iterate convergence properties of the Extragradient (EG) algorithm. It is well …

被引用次数：107 相关文章所有 7 个版本

[PDF] mlr.press

Generalized Polyak step size for first order optimization with momentum

X Wang, M Johansson, T Zhang - … Conference on Machine …, 2023 - proceedings.mlr.press

In machine learning applications, it is well known that carefully designed learning rate (step
size) schedules can significantly improve the convergence of commonly used first-order …

被引用次数：15 相关文章所有 8 个版本