Fantastic generalization measures and where to find them

Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives

J Li, J Chen, Y Tang, C Wang, BA Landman… - Medical image …, 2023 - Elsevier

Transformer, one of the latest technological advances of deep learning, has gained
prevalence in natural language processing or computer vision. Since medical imaging bear …

被引用次数：138 相关文章所有 9 个版本

[PDF] neurips.cc

Directional convergence and alignment in deep learning

Z Ji, M Telgarsky - Advances in Neural Information …, 2020 - proceedings.neurips.cc

In this paper, we show that although the minimizers of cross-entropy and related
classification losses are off at infinity, network weights learned by gradient flow converge in …

被引用次数：150 相关文章所有 8 个版本

[PDF] mlr.press

Towards understanding sharpness-aware minimization

M Andriushchenko… - … Conference on Machine …, 2022 - proceedings.mlr.press

Abstract Sharpness-Aware Minimization (SAM) is a recent training method that relies on
worst-case weight perturbations which significantly improves generalization in various …

被引用次数：113 相关文章所有 4 个版本

[PDF] neurips.cc

Swad: Domain generalization by seeking flat minima

J Cha, S Chun, K Lee, HC Cho… - Advances in Neural …, 2021 - proceedings.neurips.cc

Abstract Domain generalization (DG) methods aim to achieve generalizability to an unseen
target domain by using only training data from the source domains. Although a variety of DG …

被引用次数：345 相关文章所有 8 个版本

[PDF] arxiv.org

Effect of data encoding on the expressive power of variational quantum-machine-learning models

M Schuld, R Sweke, JJ Meyer - Physical Review A, 2021 - APS

Quantum computers can be used for supervised learning by treating parametrized quantum
circuits as models that map data inputs to predictions. While a lot of work has been done to …

被引用次数：510 相关文章所有 4 个版本

[PDF] neurips.cc

What is being transferred in transfer learning?

B Neyshabur, H Sedghi… - Advances in neural …, 2020 - proceedings.neurips.cc

One desired capability for machines is the ability to transfer their understanding of one
domain to another domain where data is (usually) scarce. Despite ample adaptation of …

被引用次数：451 相关文章所有 9 个版本

[PDF] mlr.press

Asam: Adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks

J Kwon, J Kim, H Park, IK Choi - International Conference on …, 2021 - proceedings.mlr.press

Recently, learning algorithms motivated from sharpness of loss surface as an effective
measure of generalization gap have shown state-of-the-art performances. Nevertheless …

被引用次数：259 相关文章所有 4 个版本

[PDF] mlr.press

Understanding gradient descent on the edge of stability in deep learning

S Arora, Z Li, A Panigrahi - International Conference on …, 2022 - proceedings.mlr.press

Deep learning experiments by\citet {cohen2021gradient} using deterministic Gradient
Descent (GD) revealed an Edge of Stability (EoS) phase when learning rate (LR) and …

被引用次数：93 相关文章所有 7 个版本

[PDF] neurips.cc

What neural networks memorize and why: Discovering the long tail via influence estimation

V Feldman, C Zhang - Advances in Neural Information …, 2020 - proceedings.neurips.cc

Deep learning algorithms are well-known to have a propensity for fitting the training data
very well and often fit even outliers and mislabeled data points. Such fitting requires …

被引用次数：367 相关文章所有 7 个版本

[PDF] neurips.cc

Bayesian deep learning and a probabilistic perspective of generalization

AG Wilson, P Izmailov - Advances in neural information …, 2020 - proceedings.neurips.cc

The key distinguishing property of a Bayesian approach is marginalization, rather than using
a single setting of weights. Bayesian marginalization can particularly improve the accuracy …

被引用次数：675 相关文章所有 6 个版本